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Abstract 


\  JAMES,  DOUGLAS.  Conjugate  Gradient  Methods  for  Constrained  Least 
uares  Problems  (directed  by  Robert  J.  Plemmons).  . 

^  1988,  Barlow,  Nichols,  and  Plemmons  proposed  ap  order-reducing  con¬ 
jugate  gradient  algorithm  for  solving  constrained  least  squares  problems.  They 
pt  jved  that  this  method,  which  we  call  Algorithm(BN^  is  superior  to  p-cyclic 


SORjn  exact  arithmetic^  •  'jpSu'/c: ■^Ss\n/c  OV-tpre U  j 


j^ere  we  continue  the  study  of  Algorithm  BNP.  We  identify  and  correct  a 
source  of  instability  in  the  original  algorithm,  and  develop  a  parallel  version 
suitable  for  substructured  probleou.  We  prove  that  BNP  is  superior  to  block 
accelerated  overrelaxation  (AOR),  and  establish  a  connection  between  BNP  and 
a  preconditioned  form  of  the  weighting  method.  We  also  show  that  BNP  can 
be  viewed  as  a  nullspace  method,  in  which  a  distinguished  choice  of  the  basis 
for  the  nullspace  is  used  but  never  formed.  Finally,  we  exploit  this  nullspace 
characterization  to  extend  BNP,  produdng  a  class  of  algorithms  we  call  implicit 
nullspace  methods.  These  methods  allow  great  flexibility  in  the  choice  of  pre¬ 
conditioner,  and  can  be  used  to  solve  problems  for  which  BNP  is  not  well  suited. 
Like  BNP,  the  extensions  are  suitable  for  parallel  implementation  on  substruc¬ 
tured  problems.  Experiments  on  structural  engineering  and  Stokes  Flow  modeb 
suggest  that  BNP  and  its  extensions  offer  a  competitive  alternative  to  existing  — 
iterative  algorithms  for  solving  constrained  least  squares  problems. 


The  appendix  describes  a  mechanism  which  can  cause  the  breakdown  of  ed 
incomplete QR factorizations.  — 

AlAOrr^W. 

^  -  l\  '  I  A  .  ..  t  /  I  _  I 
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“I  don’t  know  enough,”  replied  the  Scarecrow  cheerfully.  “My  head 
is  stuffed  with  straw,  you  know,  and  that  is  why  I  am  going  to  Oz  to 
ask  him  for  some  brains.” 

“Oh,  I  see,”  said  the  Tin  Woodman.  “But,  after  all,  brains  are  not 
the  best  things  in  the  world.” 

“Have  you  any?”  inquired  the  Scarecrow. 

“No,  my  head  is  quite  empty,”  answered  the  Woodman;  “but  once  I 
had  brains,  and  a  heart  also;  so,  having  tried  them  both,  I  should  much 
rather  have  a  heart.” 


L.  Frank  Baum 
The  WixardofOz 
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1.  Introduction 


In  this  thesis,  we  cunsidei  the  solution  of  the  equality  constrained  least 
squares  problem,  or  problem  LSE:  given  an  mxxn  matrix  E,  an  mjxn 
matrix  Gr,  an  mixl  vector  and  an  m^xl  vector  c, 

minimize  \\Gy  ~  clis  v.uch  that  Ey  =  b,  (l.I) 

We  place  c  ditious  on  the  problem  guaranteeing  a  unique  solution,  and 
presume  that  toe  matrices  are  large  and  sparse,  so  that  it  b  plausible  to  consider 
iterative  algorithms.  All  assumptirms  are  realistic  for  a  wide  range  of  important 
applications. 

We  will  also  need  a  number  of  equivalent  formulations  of  the  problem.  The 
so-called  Kuhn-T^cker  formulation  U  the  starting  point  for  each  of  the  al¬ 
gorithms  we  discuss,  while  the  constrained  minimization  and  saddle  point 
forms  b  the  natural  setting  for  many  physical  ^plications. 

The  motivating  example  for  our  work  b  the  static  analysb  of  engineering 
structures:  given  a  large  structxire  subjected  to  an  external  load,  find  the  in¬ 
ternal  forces  at  equilibrium.  When  the  problem  b  modelled  using  the  force 
method  (see,  for  example,  [17]),  the  constraint  Ey  sab  captures  the  fact  that 
the  forces  sum  to  zero  at  any  node  in  the  structure  (the  equilibrium  condi¬ 
tion).  We  then  look  for  the  unique  set  of  internal  forces  which  minimizes  a 
type  of  potential  energy  subject  to  thb  constraint.  Thb  application  b  but  one 


^cample  of  a  more  general  physical  principle  at  work:  minimizing  an  energy 
functional  subject  to  an  equilibrium  constraint  is  a  central  idea  throughout  the 
physical  sciences  (see  Strang  [34],  [35]),  so  problem  LSE  in  its  various  equiv¬ 
alent  forms  has  a  wide  range  of  applications.  We  use  as  a  second  example  a 
discretization  of  a  Stokes  Flow  problem  leading  to  a  saddle  point  system. 

Our  study  of  problem  LSE  and  its  equivalent  forms  centers  around  an  al¬ 
gorithm  first  proposed  and  analyzed  by  Barlow,  Nichols^  and  Plemmons  [4]. 
rhe  derivation  of  this  algorithm,  which  we  call  Algorithm  BNP,  starts  with 
a  repartitioned  form  of  the  Kuhn-Tucker  equations  which  has  square,  non- 
singular,  and  easily  invertible  diagonal  blocks.  The  authors  rewrite  this  system 
by  rqtplying  block  Gauss  elimination,  producing  a  symmetric  positive  definite 
su1>problem  with  the  n-m\  leading  components  of  the  residual  r  ss  c  —  Gy  as 
the  unknowns.  They  then  apply  a  variation  of  the  conjugate  gradient  algorithm 
to  this  sub-problem,  generating  at  each  iteration  an  approximation  to  the  orig¬ 
inal  unknown  y.  The  method  is  order-  or  dimension-redudog  in  the  sense 
that  the  sub-problem  hu  fewer  unknowns  than  the  original  problem  in  y. 

Barlow,  Nichob,  end  Plemmons  prove  in  [4]  that  Algorithm  BNP  is  superior 
to  so-called  p-cyclic  SOR  methods  in  a  certain  well-defined  sense.  We  extend 
this  result,  proving  the  algorithm  is  also  superior  to  a  two-parameter  general¬ 
ization  of  SOR  known  as  Block  Accelerated  Overrelaxation  or  AOR.  We 
also  establish  a  connection  between  BNP  and  a  preomditioned  version  of  the 
ciaasical  weighting  method.  In  addition,  we  identify  and  correct  a  source  of 
inaccuracy  in  the  original  vision  of  algorithm  BNP. 

Our  main  purpose,  howeve,  is  to  extend  the  algorithm  to  two  types  of 
problems  for  which  Algorithm  BNP  is  not  well  suited.  In  theory,  BNP  requires 
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fairly  mild  assiunptions  on  problem  LSE.  In  practice,  however,  implementation 
is  difficult  unless  the  matrix  G  has  full  column  rank.  This  assumption  holds  for 
many  but  not  all  applications  (it  fails  to  hold,  for  example,  if  any  of  the  elements 
in  the  structures  problem  fails  to  behave  elastically).  The  algorithm  may  also 
be  difficult  to  s^ply  to  problems  expressed  in  saddle  point  form  (e.g.  Stokes 
flow).  We  propose  an  extension  to  BNP  which  can  overcome  these  limitations. 

The  key  to  the  extension  is  recognizing  a  connection  between  algorithm  BNP 
and  the  classical  nullspace  method.  The  latter  begins  with  some  convenient 
particular  solution  j/p  to  the  construnt,  and  a  matrix  N  whose  columns  form 
a  basis  for  ti\e  nvMspace  of  E,  One  then  seeks  a  coordinate  vector  x  such  that 
y  rs  *<p4  iVx  is  the  solution  to  the  constrained  problem.  We  show  that  BNP  may 
be  viewed  as  a  v'.^iation  of  the  nullspace  method,  in  which  distinguished  choices 
of  yp  and  N  are  used  bnt  never  formed  (we  will  call  such  a  method  an  implicit 
aullipace  method,  or  IPTM).  The  basis  matrix  N  is  seen  to  be  acting  as  a 
pieconditioner  for  a  aet  of  normal  equations  in  jEacb  red  form.  We  generalize 
algorithm  BNP  by  producing  implicit  nullspace  methods  for  other  choices  of 

The  extension  preserves  the  spirit  of  BN^  (in  particular,  the  algorithms  are 
order-reducing),  but  is  somewhat  more  doxible:  it  becomes  relatively  easy  tc 
construct  preconditiontos  for  problems  in  which  G  lacks  full  column  rank,  as 
well  as  problems  in  saddle  p<dnt  form.  Bo'^  BNP  and  the  r\o:e  general  implidt 
nuUnpace  algOTtthms  can  be  implmnented  n  parallel  when  the  matrices  reflect 
a  substnicturing  (domain  decompositi  vn)  of  the  physical  model. 

Chapter  2  prorides  an  overview  of  the  propertin  of  problem  LSE  and  its 
equivalent  forms.  We  also  show  how  both  the  structures  and  Stokei  Flow  prob- 
lenu  can  be  expressed  in  terms  of  these  f(»m8,  and  diso^  the  special  structure 
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of  the  underlying  matrices  for  substructured  problems.  Chapter  3  provides  an 
overview  of  each  of  the  four  basic  iterative  algorithms  we  discuss:  p>cyclic  SOE, 
block  AOR,  a  preconditioned  form  of  the  weightmg  method,  and  of  course  Algo¬ 
rithm  BNP  itself.  We  include  a  detailed  look  at  the  construction  of  the  modified 
Kuhn-Tucker  equations  needed  to  implement  all  of  these  algorithms.  In  chap¬ 
ter  4  we  compare  and  contrast  the  methods:  we  summarize  the  results  in  [4] 
concerning  BNP  and  p-cydic  SOR;  we  extend  these  results  to  show  that  BNP 
is  superior  to  block  AOR;  and  we  show  that  BNP  can  be  viewed  as  the  limiting 
case  of  the  preconditioned  weighting  method. 

The  extension  of  BNP  is  the  topic  of  chapter  5:  we  establish  the  relation¬ 
ship  between  BNP  and  the  nuUspace  method,  then  exploit  this  connection  to 
derive  the  more  general  implicit  nullspace  methods.  We  then  provide  a  menu 
of  possible  INM  algorithms.  In  chapter  6  we  summarize  the  results  of  our  nu¬ 
merical  experiments.  We  include  tests  on  a  variety  of  structural  engineering 
problems,  including  problems  which  violate  the  assumptions  of  elasticity.  We 
also  describe  the  performance  of  various  forms  of  INM  on  a  Stokes  Flow  prob¬ 
lem.  In  each  case,  we  include  results  of  experiments  involving  parallel  versions 
of  the  algorithms. 

We  offer  some  concluding  remarks  m  chapter  7.  Finally,  in  the  appendix, 
we  describe  a  mechanism  which  can  cause  the  breakdown  of  certain  types  of 
incomplete  QR  factorizations  proposed  in  the  literature. 


4 


2,  Formulation  of  the  Problem 


This  chapter  is  an  overview  of  the  equality  constrained  least  squares  problem. 
In  §1.1  we  mention  some  of  the  basic  properties  of  problem  LSE,  state  the 
assumptions  under  which  we  proceed,  and  express  the  problem  in  four  equivalent 
forms.  The  material  in  §1.1  is  largely  a  distillation  of  more  complete  discussions 
in  Bjorck  [7],  Golub  and  van  Loan  [12],  and  Strang  [35].  The  next  two  sections 
include  descriptions  of  two  physical  applications:  the  static  analysis  of  structures 
(§1.2)  and  a  finite  difference  discretization  of  a  Stokes  Flow  problem  (§1.3).  In 
§1.4,  we  use  these  two  examples  to  explain  the  effect  of  substructuring  (domain 
decomposition)  on  the  structure  of  the  underlying  matrices. 

2.1  Problem  LSE  and  Equivalent  Forms 

Recall  from  the  introduction  the  statement  of  problem  LSE:  given  an  mixn 
matrix  an  maxn  matrix  G,  an  mixl  vector  6,  and  an  maXl  vector  c, 

minimize  ||Gy » cjja  such  that  Ey  s  6.  (2.1) 

We  make  two  plausible  assumptions  before  proceeding: 

HI:  £  has  full  row  rank,  so  6  b  in  the  range  of  E  and  the  problem  has  at 
least  one  solution;  and 
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H2:  the  nuUspaces  of  E  and  G  intersect  trivially  (or,  equivalently,  |  ^  j 
full  column  rank),  guarantedng  that  the  solution  is  unique. 

Two  types  of  manipulations  will  prove  especially  useful.  Given  any  non¬ 
singular  matrix  S,  we  can  replace  the  constraint  Ey  bin  problem  LSE  with 
the  new  constraint  BEy  —  Bb.  This  means  we  can  apply  Gauss  elimination  or 
orthogonal  reduction  to  E  and  b  without  affecting  the  solution  y;  we  can  use 
this  fact  to  reduce  E  to  some  convenient  ‘‘canonical”  form.  Similarly,  let  Q  be 
an  orthogonal  matrix.  Then,  since  premultiplication  by  Q  preserves  the  2-norm, 
we  can  replace  the  quantity  {Gy  —  c)  in  LSE  with  {QGy  -  Qc).  Thus  we  can 
apply  orthogonal  rotations  (but  not  Gauss  elimination)  to  the  rows  of  G  and 
e  without  affecting  the  solution.  Additionally,  we  can  scale  (Gy  —  c)  by  any 
constant  multiple  of  the  identity  matrix. 

We  will  also  make  frequent  use  of  the  three  equivalent  formulations  of  LSE 
described  below. 

Constrained  Minimization  Problem.  Note  that 

llGy  -  c||l  =  y^(^Gy  -  2y^G^c  +  (Fc,  (2.2) 

and  that  the  term  <Fc  has  no  effect  on  the  value  of  y  at  which  the  quantity  is 
minimized.  With  this  in  mind,  define 

P(y)  =  yrG^Gy-2y^G^c,  (2.3) 

noting  that  y  satisfies  problem  LSE  if  and  only  if  y  is  the  solution  of  the  con¬ 
strained  minimization  problem: 

minimize  P(y)  such  that  Ey^b.  (2.4) 


6 


Kuhn-Tucker  Formulation.  Let  rszc— Gy  represent  the  residual  vector  we 
seek  to  minimize.  Introduce  the  miXl  Lagrange  multiplier  A  (one  component 
for  each  constraint),  and  define  the  functional 

+  (2-5) 

Here  ^  is  as  in  equation  (2.3).  Under  assumptions  HI  and  H2,  the  solution 
y  to  problem  LSE  is  part  of  the  ordered  pair  (y,  A)  defining  the  unique  saddle 
point  of  this  functional.  One  can  find  this  saddle  point  by  solving  the  so-called 
Kuhn-Tucker  equations: 

E  0  0  1  fy  ]  [6' 

G  /  0  r  =  c  .  (2.6) 

Q  X  0 

«  w  m  mm 

The  first  block  equation  in  this  system,  which  is  just  the  constraint  Ey  m  6, 
results  from  differentiating  <f>  with  respect  to  each  component  of  A,  and  setting 
the  result  equal  to  zero.  The  second  equation  simply  defines  the  residual  r.  The 
third  equation  comes  from  differentiating  ^  with  respect  to  each  component  of 
y.  The  Kuhn-Tucker  formulation  is  the  starting  point  for  most  of  the  algorithms 
of  concern  to  us. 

Saddle  Point  Formulation.  Let  F  =  and  $  =  -G^c.  Note  that  the 
functional  V  defined  in  equation  (2.3)  can  then  be  rewritten  as 

P(y)  =  y^Fy  +  2y^s.  (2.7) 

Introduce  the  Lagrange  multiplier  and  define  the  functional 

!»(»,>)  =  ■?(»)-/(£»-?)•  (2.8) 
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Once  again  the  solution  to  LSE  is  associated  with  the  seiddle  point  of  this 
functional;  thus  one  can  find  the  solution  to  problem  LSE  by  solving  the  linear 
system  which  defines  this  saddle  point: 

r  -F  1  r  if  1  _ 

I  £  0 

'  We  chose  the  signs  in  the  definitions  of  /i,  s  and  ^  so  the  matrix  E  is  the  same 
in  all  four  formulations.  The  multiplier  n  b  related  to  A  in  the  Kuhn-Tucker 
equations  by  /x  s  -A. 

2.2  First  Example:  Static  Analysis  of  Engi¬ 
neering  Structures 

The  motivating  example  for  our  work  b  the  static  analysb  of  engineering  struc* 
tures:  given  a  physical  structure  subject  to  an  external  load,  find  the  internal 
forces  at  equilibrium.  In  thb  section,  we  use  a  simple  example  to  show  how 
to  model  this  problem  as  a  constrained  least  squares  problem.  We  employ  the 
so-called  force  method  (see,  for  example,  Heath,  Plemmons,  and  Ward  [17], 
Przemieniecki  [32],  or  virtually  any  text  on  structural  analysis).  Much  of  the 
material  below  b  a  summary  of  the  disctiseion  in  section  2.4  of  Strang  [35]. 

Consider  the  truss  depicted  in  figure  2.1.  The  elements  in  the  structure 
are  rigid  bars.  We  assume  the  bars  cannot  bend;  longitudinally,  they  behave 
as  if  they  were  very  stiff  springs,  reacting  to  external  loads  by  compressing  or 
elongating.  Thus,  associated  with  each  element  i  there  b  a  single  unknown  yi 
representing  the  internal  force  in  th^  element  when  the  structure  as  a  whole 
b  at  equilibrium.  We  establish  the  convention  that  a  positive  yi  represents  an 
element  in  tension. 
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The  elements  are  connected  to  eadk  oth<»^  by  pins;  the  locations  of  these 
pins  are  called  nodes.  We  assume  the  problem  is  modelled  so  that  external 
forces  act  only  at  the  nodes;  in  this  example,  there  is  a  single  unit  force  acting 
downward  on  node  5.  The  nodes  at  the  southwest  and  southeast  comers  of 
the  truss  are  supported  or  *grounded’*  in  the  sense  that  pin  joints  bolt  them 
to  an  immobile  base  (this  means  we  are  not  free  to  specify  an  ext^nal  loading 
at  these  nodes;  the  force  on  a  fixed  node  b  that  necessary  to  prevent  the  node 
from  moving).  The  other  five  nodes  are  free  to  nmve  b  the  x-s  plane  as  the 
elements  lengthen  or  contract  under  the  load*  We  assume,  however,  that  the 
dUplacements  at  these  free  nodes  are  very  smalL  We  also  presume  that  the 
structure  is  stable:  there  ate  enough  properly  placid  dements  to  prevent  the 
structure  from  collapsing  on  itsdf,  and  it  is  supported  b  sudi  a  way  that  there 
is  no  set  of  external  loads  whidi  causes  a  rigid  motion. 

If  the  structure  is  at  equilibrium,  then  the  forces  aetbg  U  each  node  must 
sum  to  zero  (the  equilibrium  condition),  b  the  example,  we  therefore  begjn 
by  writing  a  force  balance  m  both  the  x  and  z  coordinate  diiedions  for  each  of 
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Figure  2.3:  Equilibriiua  Coosireint  =  6  for  Pin-jointed  Thus 


the  five  £ree  nodes.  At  node  5,  for  insUncet  the  equilibrium  condition  gives  us 
the  two  equetions 

+A/l^->/^^yio~yti  *  0  (*  direction) 

— ^ys  —  y/^  3B  1  (*  direction).  *** 


We  now  write  the  collection  of  ell  ten  equntioiyi  in  metiix  form:  etch  of  the  five 
free  nodes  ocmtiibutes  4  block  of  two  rows  (ooe  for  etch  direction  La  which  the 
node  is  free  to  move).  The  result  is  the  constraint  £y  «  6  in  problem  LSE.  The 
full  system  for  the  truss  in  figure  2.1  appears  ia  figure  2.2. 

The  vector  6  stores  the  spedfied  external  force  vector.  The  matrix  £,  com¬ 
monly  called  the  equilibrium  matrix,  holds  the  non-sero  coefficients  In  the 
force  balance  equations.  It  captures  the  shape  and  connectivity  of  the  structure; 
its  cotries  are  icdqpmident  of  material  properties.  The  matrix  B  has  one  other 
important  property:  if  the  sitructure  is  stable,  then  the  equOibrium  matrix  Has 
full  row  tank  (see  Strang  [35]).  Tbus  the  equilibrium  constraint  for  a  stable 
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structure  satisfies  hypothesis  HI.  Because  each  node  is  attached  to  only  a  small 
number  of  elements,  E  is  extremdy  sparse  when  the  structure  is  large. 

More  generally,  a  finite  element  model  of  a  structure  may  be  composed  of 
elements  which  allow  more  freedom  and  complexity  than  rigid  elastic  bars  (see, 
for  example,  Przemieniecki  [32]).  Figure  2.3  depicts  some  typical  examples: 
a  pair  of  two^limensiimal  planar  elements  (in  this  case,  a  right  triangle  and 
a  square),  and  a  solid  tetrahedral  element.  In  these  examples  we  assume  the 
only  independent  internal  forces  ate  those  shown  in  the  sketches,  and  we  take 
the  comers  of  the  elements  to  be  the  nodes.  Other  more  complex  modeb  are 
possible  (e.g.  elements  which  account  for  the  effects  of  bending,  supports  which 
restrict  nodes  in  some  but  not  all  directions,  etc.),  but  we  limit  our  examples 
to  the  types  of  elements  which  appear  in  our  test  problems. 

In  any  case,  regardless  of  the  types  of  elements  in  the  structure,  one  still  con* 
structs  the  equilibrium  constraint  »  6  by  writing  force  balance  equations 
at  each  free  node.  Now,  however,  each  element  generates  a  hhek  of  odumns 
in  the  equilibrium  matrix  E  (one  column  feu*  each  indepenaent  internal  force). 
Thus,  for  example,  the  equilibrium  matrix  for  a  two^iimensionai  structure  mod¬ 
elled  with  planar  triangular  elenunts  will  have  three  adumns  per  element,  and 
two  tows  per  free  node.  The  matrix  E  will  be  sparse,  and  will  have  full  tow 
rank  when  the  structure  is  stable.  Unlike  the  mUrix  in  figure  2.2,  however,  the 
equilibrium  matrix  will  often  have  far  more  columns  than  rows  (see  the  test 
problems  in  chapter  6>  where  mi  ranges  from  roughly  |  to  |  the  size  of  n«). 

Of  a>urse,  the  equilibrium  condition  does  not  tell  the  whole  story.  Math¬ 
ematically,  there  will  generally  be  tix  infinite  number  of  vectors  y  satisfying 
the  constraint  Ey  sab.  Physically,  we  have  not  yet  accounted  for  the  material 
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properties  of  the  elements  in  the  structure.  A  symmetric  non^negative  defi¬ 
nite  matrix  Ft  called  the  element  flexibility  matrix^  captures  the  material 
properties.  The  solution  we  seek  is  the  set  of  internal  forces  y  which  mini¬ 
mize  complementary  energy  subject  to  the  equilibrium  constraint  (see,  for 
example,  Strang  [35]): 

minimize  y^Fy  such  that  Ey  s  6.  (2-11) 


Thus  we  can  express  the  static  structures  problon  as  a  constrained  minimization 
problem.  Unless  the  elements  are  pre-stiessed,  the  vectors  c  and  s  —  -G^c  given 
in  equations  (2.1)  and  (2.7)  are  zero  in  this  implication.  The  components  of  the 
vector  >-A  in  the  Kuhn-Tucker  formulation  represent  the  small  displacements  of 
the  nodes  from  their  neutral  (unloaded)  positions. 

For  a  truss  ooomosed  of  rigid  elastic  bats,  F  is  a  diagonal  matrix  with 
positive  diagonal  entries  determined  by  Hooke's  Law  (the  entry  in  position  (i,  t) 
of  F  is  the  secipiocal  of  the  Spring*  constant  for  ekment  t;  again,  see  Strang 
[35]).  G,  of  course,  is  a  diagonal  matrix  with  diagonal  entries  equal  to  the 
square  roots  of  the  corieipoading  entries  in  F.  For  more  general  elements,  F  b 
a  block  diagonal  matrix,  with  one  small  block  for  each  element  in  the  structure. 
The  block  associated  with  the  right  Uiangular  planar  element  in  figure  2.3,  for 
example,  is  the  3x3  matrix 


Fae 


i. 

et 


(2.12) 


where  the  ^.hysical  constants  F,  v,  and  t  tcpresent  Young's  Modulus,  Poisson’;t 


Ratio,  and  the  thickimsa  ^  the  element  tespecUvdy  (see  [32|).  The  matrix  is 
positive  definite  for  0  <  v  <  1  (the  >^ue  v  s  0.3  is  typical).  The  blocks 
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Figure  2.4:  Matrices  for  Static  Structures  Prcblem 


associated  with  the  planar  square  element  and  tetrahedral  element  (5x5  and 
6x6  respectively)  are  defined  by  similar  formulas.  In  all  cases,  the  matrix  F 
is  symmetric  positive  definite  when  all  elements  behave  elastically.  Thus,  the 
Cholesky  factor  of  F,  which  is  block  diagonal  with  upper  triangular  blocks, 
can  be  used  as  the  matrix  G  (figure  2.4).  Moreover,  since  G  is  square  and 
non<smgular  for  elastic  problems,  hypothesis  H2  from  the  previous  section  is 
satisfied  trivially. 

At  a  critical  value  of  u,  howevei,  an  element  no  longer  behaves  elastically, 
and  the  block  of  F  associated  with  tne  element  becomes  singular.  This  critical 
value  (i/  ss  1  for  the  planar  elements,  s  ^  for  the  tetrahedron)  represents  the 
point  at  which  an  element  no  longer  changes  area  (or  volume)  when  subjected 
to  a  load  (see  [32]).  At  the  critical  value,  the  block  is  still  symmetric  and  non- 
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negative  definite.  Thus,  when  such  blocks  are  present  in  F,  we  can  still  find 
G  such  that  F  -  Cr^G,  but  now  the  matrix  G  lacks  full  column  rank.  While 
the  linear  theory  described  above  no  longer  provides  an  adequate  model  of  the 
physics  in  this  case,  we  will  use  these  critical  values  to  simulate  “damaged” 
elements  in  the  structure,  g’ving  us  an  opportunity  to  generate  test  problems 
which  are  difficult  to  solve  using  algorithm  BNP  as  originally  derived. 

We  conclude  this  section  by  mentioning  that  the  force  method  described 
above  is  one  of  two  approaches  to  solving  the  static  structures  problem.  A  sec* 
ond  formulation,  the  so-called  displacement  method,  solves  an  unconstrained 
least  squares  problem  for  the  displacements,  and  then  recovers  the  force  vec¬ 
tor  y  (see  Scrang  [35]  for  a  carehri  discussion  of  the  connection,  between  the 
two  methods).  For  many  reasons,  the  displacement  method  is  by  far  the  more 
common  approach  used  to  solve  structures  problems.  The  force  method  formu¬ 
lation  described  here,  however,  has  some  advantages.  In  particular,  it  explicitly 
« 

separates  calculations  involving  the  geometry  of  the  structure  (the  constraint) 
firom  those  involving  the  material  properties  (the  energy  functional),  and  can 
therefore  be  useful  when  analyzing  a  fixed  structure  while  varying  the  material 
properties  (see,  for  example,  Batt  and  Gellin  [1]). 

2.3  Second  Example:  Stokes  Flow 

The  static  analysis  of  engineering  structures  provides  but  one  example  of  a  more 
general  physical  principle  at  work:  minimizing  an  energy  functional  subject  to 
an  equilibrium  constraint  is  a  central  idea  throughout  the  physical  sciences 
(see  Strang  [34],  [35]).  A  second  example,  leading  naturally  to  a  saddle  point 
problem,  is  the  standard  Stokes  model  for  steady,  very  viscous  flows.  If  we 
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consider  only  the  constant  density  case,  and  scale  out  the  Reynolds  number, 
the  equations  describing  the  flow  are  given  by 

V*ji-Vp  =  I  (2.13) 

V-JL  ^  g.  (2.14) 

The  quantity  ji  represents  the  continuous  velocity  vector,  with  one  component 
for  each  coordinate  direction.  The  scalar  p  represents  pressure.  The  first  equa¬ 
tion  comes  from  conservation  of  momentum;  the  vector-valued  forcing  term  £ 
reflects  external  forces  such  as  gravity.  The  Laplacian  operator  acts  on  each 
component  of  g,  separately.  The  second  equation  reflects  conservation  of  mass; 
in  the  absence  of  sources  or  sinks,  the  righthand  side  g  is  zero. 

We  must  also  specify  appropriate  boundary  conditions.  We  consider  only 
the  Dirichlet  boundary  condition  li  ~  (the  case  2  0,  which  is  physically 

plausible  for  viscous  flows  at  a  solid  boundary,  is  the  so-called  ‘‘no-slip”  bound¬ 
ary  condition).  Note  that  only  the  derivative  (gradient)  of  pressure  appears 
in  the  differential  equations.  Thus,  in  the  absence  of  boundary  conditions  on 
pressure,  we  can  determine  pressure  only  up  to  a  constant. 

When  we  discretize  the  continuous  problem,  we  expect  to  obtain  the  saddle 
point  form  (2.9).  The  matrix  E  will  approximate  the  divergence  operator,  and 
the  conservation  of  mass  equation  will  become  the  constraint  Ey  s  6.  The 
vector  6  will  reflect  boundary  contributions  as  well  as  the  forcing  term  g.  The 
components  of  the  Lagrange  multiplier  n  will  represent  pressure.  Since  the 
neg.itive  of  the  gradient  is  the  adjoint  of  the  divergence,  we  will  find  that  E'^n 
will  ximate  -Vp.  The  symmetric  positive  definite  matrix  F  will  come 
from  a  discretization  of  the  negative  of  the  Laplacian  operator  V^,  and  s  will 
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Figure  2.5:  Marker-and-Cell  Discretization  of  Unit  Square  (n«=:3) 


reflect  both  boundary  contributions  and  the  forcing  function  £. 

The  test  problems  m  chapter  6  are  based  on  a  marker>and-cell  or  MAC 
finite  difference  discretization  first  studied  by  Harlow  and  Welch  [16]  (see  also 
Hall  [15]).  Here  we  consider  the  MAC  method  for  the  Dirichlet  problem  on  the 
unit  square.  Let  x  and  z  be  the  horizontal  and  vertical  coordinate  directions,  and 
begin  by  dividing  the  domain  into  cells  as  shovm  m  figure  2.5.  For  simplicity 
we  consider  a  regular  partitioning  of  the  domain:  let  n,  be  the  number  of 
cells  in  each  coordinate  direction,  and  h,  s  1/n,  the  horizontal  and  vertical 
length  of  each  cell.  We  specify  the  pressure  n  at  nodes  placed  at  the  center  of 
each  cell.  Now  connect  the  pressure  nodes  with  directed  line  segments.  The 
components  of  the  discrete  velocity  vector  y  are  specified  at  the  midpoints  of 
these  directed  line  segments:  horizontal  components  of  velocity  on  horizontal 
line  segments  (edges),  and  vertical  components  of  velocity  on  vertical  edges. 
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Figure  2.6:  MAC  Stencil:  Divergence  Operator 


Choose  a  convenient  numbering  of  the  nodes  and  edges.  For  now,  we  number 
all  the  horizontal  edges  before  the  vertical  ones,  and  we  number  both  nodes  and 
edges  fr6m  left  to  right,  beginning  at  the  bottom.  Note  that  the  boundary  of 
the  grid  does  not  reach  the  physical,  boundary;  instead,  it  is  offset  by  a  distance 
ofW2. 

We  discretize  the  conservation  of  mass  equation  V'H  —  ghy  writing  a  flow 
balance  equation  at  each  node  in  the  grid,  mudi  as  we  did  for  the  structures 
problem  in  the  previous  section.  Letting  the  subscripts  N,  5,  E,  W,  and  C  rep¬ 
resent  the  north,  south,  east,  west,  and  center  positions  on  a  stencil  (figure  2.6), 
centered  differences  at  each  interior  pressure  node  produces  the  equation 

iVE  —  Vw)  +  (l/jv  “  ys)  =  K9C‘  (2.15) 

At  boundary  nodes,  one  or  more  values  of  y  are  given  by  the  boundary  data, 
and  must  be  brought  to  the  righthand  side  of  the  flow  balance  equation.  For 
mcample,  at  nodes  on  the  interior  of  the  the  west  wall  of  the  grid,  yw  is  known, 
and  the  equation  becomes 


yj5 + yw  -  ys  =  f*$9o + yw- 


(2.16) 
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Figure  2.7:  Matrix  Eq  for  MAC  Grid  (n,=3) 

If  we  now  write  the  collection  of  all  such  equations  as  a  linear  system,  we 
obtain  a  matrix  equation  of  the  form 

Ecy  =  bo.  (2.17) 

The  matrix  Eq  for  the  case  n«s3  is  shown  in  figure  2.7;  the  pattern  for  larger 
n«  is  similar.  But  Eo  is  not  quite  the  equilibrium  matrix  we  seek.  We  will 
produce  the  matrix  E  and  the  associated  constramt  Ey  ^b  after  a  modification 
described  later  in  this  section. 

Before  completing  the  construction  of  the  constrmnt,  we  address  the  dis* 
cretization  of  the  conservation  of  momentum  equation  (2.13).  Recognize  first 
that  the  equation  is  a  vector  equation  with  components  in  the  two  coordinate 
directions: 

V*vi  —  p»  «  /i  (» component)  (2.18) 

V*W3— p,  55  /j  (*  component).  (2.19) 

Here  and  Va  represent  the  x  and  z  components  of  the  continuous  velocity 
vector  m  Pa  and  p,  are  the  derivatives  of  pressure  with  respect  to  z  and  z 
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Figure  2.8:  MAC  Stendl:  Hori2ontal  Momentum 

respectively;  and  /i  and  /j  are  the  components  of  the  vector-valued  forcing 
term  f. 

We  disaetize  the  conservation  of  momentum  equation  by  writing  a  difference 
equation  at  the  center  of  each  edge  of  the  grid:  on  horizontal  edges,  we  use  the 
horizontal  equation  (2.18),  while  on  vertical  edges  we  use  (2.19).  Consider,  for 
example,  a  typical  horizontal  edge  with  associated  discrete  velocity  component 
yc*  Recalling  that  the  Lagrange  multiplier  n  represents  pressure  at  the  nodes, 
and  using  the  subscripts  N,  5,  W,  and  C  as  we  did  above,  we  work  with 
the  stencil  depicted  in  figure  2.8.  Using  centered  differences,  and  multiplying 
through  by  h,  (not  h]),  we  obtain  the  equation 

-  ^(4yo'  “  Vat  -  ys  -  ys  -  yw)  -  {fis  -  Mw)  =  (2*20) 

where  /i  is  evaluated  at  the  midpoint  of  the  edge  associated  with  yc.  A  sim¬ 
ilar  equation  results  firom  discretizing  equation  (2.20)  at  the  center  of  interior 
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(a)  near  easi  boundary 


(b)  near  north  boundary 


Figure  2.9:  MAC  Boundary  Corrections:  Horizontal  Momentum 
vertical  edges. 

Because  the  boundary  of  the  grid  and  the  physical  boundary  do  not  coincide, 
the  treatment  of  the  edges  near  the  boundary  is  a  bit  delicate.  There  are  two 
cases  to  consider.  The  first  case  involves  an  edge  yc  adjacent  and  perptndicxilar 
to  the  boundary,  sv.ch  as  the  edge  near  the  east  boundary  shown  in  figure  2.9a. 
This  situation  presents  no  special  problems:  a  fictitious  edge  {ys  in  the  example) 
has  a  known  value  given  by  the  boundary  data.  We  simply  substitute  this  known 
value  into  equation  (2.20)  (or  the  equivalent  equation  in  the  vertical  direction 
if  appropriate),  and  move  it  to  the  right-hand  side.  For  the  example  we  obtain 

“  ^(4yo  -  yjv  -*  ys  “  yw)  -  {tts  “•  i^w)  =  Kfi  -  (2.21) 

The  second  case  involves  an  edge  yc  adjacent  and  parotid  to  the  boundary, 
such  as  the  edge  near  the  north  boundary  in  figure  2.9b.  Here  the  fictitious 
edge  {ypf  m  the  example)  lies  outside  the  physical  boundary,  so  its  value  is  not 
specified  by  the  boundary  data.  Instead,  we  know  the  value  at  a  fictitious  edge 
ya  lying  on  the  boundary  midway  between  yc  and  ys>  We  now  interpolate  to 


obtun  an  approximate  value  for  the  fictitious  edge:  ys  =  \{yN  +  yc)*  or 

VN  =  2yB  -  yc-  (2.22) 


Now  substitute  yn  into  equation  (2.20)  and  rearrange  to  get  what  we  need: 

1  2 

”  “  ys  —  yc  -  yw)  -  (mb — mw)  =  h,/x  -  r-yjv.  (2.23) 

Aj  A# 

Of  course  edges  near  comers  of  the  physical  domain  require  both  types  of  cor¬ 
rections. 

If  we  write  the  collection  of  all  the  edge  equations  as  a  linear  system,  we 
obtain  a  matrix  equation  of  the  form 

-  Fy  +  Eln  ss  s,  (2.24) 


where  Eq  is  precisely  the  matrix  defined  in  equation  (2.17).  The  matrix  F 
contains  two  copies  of  the  disaetized  Laplacian: 


(2.25) 


where  Fh  Fv  are  associated  with  the  horizontal  and  vertical  edges  respec¬ 
tively.  More  precisely,  Fh  and  Fv  are  the  block  tridiagonal  matrices 


Fh  = 


D? 

-/  D?  *•. 


Dl 


DX  -/ 
-/  DX 


-/ 


,  (2.26) 


where  =  DjX,  is  the  (n,  —  l)x(n,- 1)  symmetric  tridiagonal  matrix  with  5’s 
on  the  main  diagonal  and  I’s  on  the  sub-  and  superdiagonals,  s  . , .  = 
is  the  (n«  ~  l)x(n«  ~  1)  symmetric  tridiagonal  matrix  with  4’s  on  the  main 
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diagonal  and  l*s  on  the  sub*  and  superdiagonals,  and  DX  is  the  n«xn«  matrix 


5  -1 

-1  4  -1 

-1  4  -1 


•  4  • 

-1  4  -1 

-1  4  -1 

-1  6 


(2.27) 


for  t  =:  1, . . . ,  n«  —  1.  In  all  cases,  /  represents  the  appropriately  sized  identity 
matrix.  While  it’s  helpful  to  appreciate  the  structure  of  the  matrix  in  general 
it  need  not  be  formed  explicitly.  We  can  compute  a  matrix*vector  product  of 
the  form  Fw  in  terms  of  the  stencils  defining  the  action  of  f*  on  an  arbitrary 
vector  to. 

At  this  point,  our  discretized  system  has  the  form 


(2.28) 


This  is  certainly  a  saddle  point  system,  but  the  matrix  £b  does  not  satisfy  the 
required  hypotheses.  Refer  to  figure  2.7,  and  note  that  £b  lacks  full  row  rank 
(the  stun  of  the  rows  of  Eq  is  the  zero  vector).  This  is  no  surprise:  smce  we  are 
assuming  Dirichlet  boundary  conditions,  every  edge  enters  one  node  and  exits 
another,  so  every  column  of  Eq  contains  a  single  entry  equal  to  - 1,  a  single  entry 
equal  to  +1,  and  no  other  non>zero  entries.  Up  to  discretization  error,  the  same 
is  true  of  the  righthand  side  vector  ho;  this  is  a  straightforward  consequence  of 
Green’s  Theorem.  We  need  to  produce  a  constraint  Eystb  such  that  E  has  full 
tow  rank.  We  can  do  this  in  the  obvious  way,  by  deleting  any  convenient  row  of 
Eq  and  the  corresponding  component  of  bo  (more  precisely,  by  annihilating  the 
row  using  Gauss  elimination  or  orthogonal  reduction).  But  we  must  examine 
the  effect  of  this  change  on  the  discrete  conservation  of  momentum  equation. 
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To  do  this,  recall  that  the  continuous  problem  determines  pressure  only  up 
to  a  constant.  Thus  if  we  specify  the  value  of  pressure  at  any  single  node,  we 
should  expect  to  determine  the  pressure  field  uniquely.  For  convenience,  set 
the  last  component  of  /t  to  zero.  Let  be  the  corresponding  row  of  £o,  and 
partition  Eq  and  /i  into 


and  n 


* 

0 


(2.29) 


Since  s  £^/i,  replacing  JEb  with  E  (and  p  with  p)  is  equivalent  to  ‘‘ground¬ 
ing’*  a  single  pressure  node  to  zero.  Thus  we  can  safely  replace  Eo  with  E 
throughout  the  saddle  point  system  (2.28).  The  matrix  has  full  row  rank,  so 
hypothesis  HI  holds.  Motever,  F  is  symmetric  positive  definite,  so  there  exists 
a  G  of  full  column  rank  such  that  F  s  CF'G.  Thai,  since  G  has  no  non-trivial 


nuUspace,  hypothesis  H2  holds  vacuously. 

There  is  still  one  important  ta^  ronaining.  We  will  need  the  matrix  G  to 
construct  one  type  of  preconditioner  for  the  Stokes  prcdilem  (see  chapter  5).  The 
Cholesky  factor  <d  Ft  however,  is  &r  too  dense  to  be  practical  for  this  purpose. 
Instead,  we  exploit  the  fact  that  the  Laplacian  operator  is  the  divergence  of  the 
gradient.  We  can  therefore  define  to  be  the  appropriate  discretization  of 
the  divergence  acting  on  the  velocity  edga  of  the  grid.  Since  the  adjoint  of  the 
divergence  is  the  negative  of  the  gradient,  we  find  that  G  satisfies  F  s  Cf^G 
as  required.  Using  centered  differences  and  techniques  described  in  Strang  (35], 
we  obtain  G  of  the  form 


G 


1 


Oh 


Gv 


(2.30) 


idiere  -Oh  and  —Gv  are  discrete  gradients  acting  on  the  horizontal  and  vertic^ 
edges  of  the  grid  respectively.  Figure  2.10  depicts  the  matrices  for  the  case 
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the  pattern  is  the  same  for  larger  n«  as  well.  It  will  prove  helpful  to  observe 
that  the  rows  of  G  can  be  reordered  so  that  the  upper  block  of  the  matrix  is 
upper  triangular  and  non^singular. 

2.4  Substructured  Problems 

When  workixig  with  discrete  modeb  such  as  those  in  §2.2  and  §2.3,  the  order¬ 
ing  of  the  nodes  and  elements  is  an  important  issue.  The  chmce  of  ordering 
affects  virtually  every  aspect  of  computational  performance,  including  program 
complexity,  storage  requirements,  and  convergence  of  iterative  schemes.  Here 
we  dbcuss  a  standard  i^ay  to  order  the  nodes  and  elements  to  improve  the 
opportunities  for  parallel  computation.  The  ordering  technique,  known  as  sub- 
structuring  or  domain  decomposition,  involves  grouping  the  components 
of  the  model  into  contiguous  pieos.  The  litarature  on  the  subject  b  vast;  see 
Ortega  and  Voigt  [27]  for  an  annotated  introduction. 

Consider,  for  example,  the  structure  WRENCH  depicted  in  2.11  (thb  model 
b  one  of  six  small  test  problems  developed  by  M.  Lawo  [6]).  Partition  the 
model  into  a  desired  number  of  dbjoint  aubatructurea  such  that  each  node  b 
in  mractiy  one  substructure.  There  are  now  two  types  of  elements  in  the  model. 
Most  elements  (the  dark  shaded  elements  in  the  figure)  interact  only  with  nodes 
in  a  tingle  substructure.  Other  elements,  called  iranaition  elements  (the 
lightly  shaded  elements  in  the  figure),  interact  sdth  nodes  from  mote  than  one 
sttbatmcture.  Number  the  nodes  and  elements  in  the  Ir^cal  way:  all  nodes  (and 
elements)  in  subatructuie  A,  then  B,  etc  D^er  the  transition  elements  until 
last,  regardleM  of  their  physical  location  in  the  structure.  For  our  purposes,  it  b 
neither  necessary  nor  approptiibe  to  use  transition  nodes  in  substructuring  the 
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Figure  2.11:  Subatructuxing  4  Finite  Element  Model 


model;  see  Plemmons  and  White  [31]  for  a  description  of  the  matrices  res^dting 
from  such  a  substructuxing. 

Because  the  interactions  between  the  nodes  and  the  elements  determine  tKe 
noo'setos  in  the  equilibrium  matrix«  we  find  that  the  partitioned  model  leads  to 
a  matrix  E  with  non'Zexot  confined  to  the  blocks  shown  in  the  figure.  Elements 
in  substructure  A,  for  examine,  produce  ttoa*xeros  confined  to  the  ^diagonal’* 
block  labeled  A,  while  the  transitimi  dements  produce  the  tranritioa  block 
labelled  T  in  the  figure.  The  matrices  F  and  G,  which  ate  block  diagonal 
with  one  small  bbdc  for  eadi  element,  teiain  their  structure  regardless  of  the 
ordering. 

The  diagonsi  bbdcs  in  the  substructured  equilibrium  matrix  deserve  further 
comment.  Note  that  each  diagonal  block  of  £  b  itsdf  an  equilibrium  matrix 
for  the  attodated  portion  of  the  model  viewed  as  an  independent  structure.  If 
thu  i^iysical  substructure  U  stable  (see  §2.2),  then  the  diagonal  Mock  has  fidl 
row  ranl^  otherwise,  it  b  rank  deficieni.  In  the  eotample,  substructures  C  and 
D  are  stable  (they  are  fixed  to  the  immovable  base  on  the  rij^t  side,  and  have 
rnffigymt  iatetnal  support),  while  substructures  A  and  B  are  not  (they  have  no 


27 


Substiuciura  A 

Transition  &ige8  |  i  |  |  |  |  !  | 

Substnicture  B 


Figure  2.12:  Narrow  Substructuring  of  MAC  Grid 

external  support).  Thus  the  first  two  diagonal  blocks  in  E  lack  full  row  rank. 

Later,  we  address  ways  to  exploit  the  substructiued  form  of  the  matrix  E, 
For  now,  we  er  .phasize  that  the  ideal  situation  is  one  in  which  the  diagonal 
blocks  are  aU  roughly  the  same  size,  the  number  of  substructures  is  the  same 
as  the  number  of  available  processors,  and  the  transition  block  has  relatively 
few  columns.  Problem  WRENCH  is  far  too  small  to  justify  the  use  of  four 
substructures  in  practice^  we  use  it  only  to  illustrate  the  concept. 

The  process  of  substructuring  the  MAC  grid  for  the  Stokes  problem  is  sim¬ 
ilar  to  what  we  have  just  done,  but  is  somewhat  more  delicate.  Begin  with 
the  grid  derived  in  the  previous  section,  and  partition  the  pressure  nodes  into 
substructures  (see  the  left  portion  of  figure  2.12  for  an  example  involving  two 
substructures;  for  clarity,  we  do  not  include  arrows  on  the  velocity  edges).  Let 
the  vertical  edges  connecting  the  substructures  serve  as  the  transition  elements 
as  shown  in  the  exploded  view  (right  half  of  the  figure).  This  leads  to  a  sub¬ 
structured  equilibrium  matrix  analogous  to  that  derived  above,  just  as  one  would 
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expect. 

This  substructuring  of  the  MAC  grid  is  useful  for  some  purposes;  we  employ 
it  for  one  set  of  experiments  in  chapter  6.  It  is  deficient  in  one  important  respect, 
however:  for  this  partitioning  of  the  edges,  the  matrices  F  and  G  do  not  reflect 
the  substructuring.  To  see  why,  first  observe  that  £7  is  an  node-edge  matrix  in 
the  sense  that  the  rows  of  E  correspond  to  nodes  in  the  grid,  while  the  columns 
of  E  correspond  to  edges  (see  Strang  [35]).  Thus,  a  substructured  matrix  E 
results  when  tdgts  in  a  given  substructure  intereurt  only  with  nodes  in  that 
substructure.  The  matrix  F,  on  the  other  hand,  is  an  edge-edge  matrix:  both 
the  rows  and  the  columns  of  the  matrix  are  associated  with  edges  in  the  grid. 
Thus,  we  obtain  a  substructured  form  for  F  when  edges  in  a  given  substructure 
interact  only  with  other  edges  in  that  substructure.  But  the  Laplacian  stencil 
defining  the  non-zeros  due  to  the  horizontal  edges  (§2.3)  causes  a  problem. 
Consider  a  stencil  centered  at  a  horizontal  edge  on  the  top  boundary  of  the  lower 
substructure  (figure  2.13;  the  edges  involved  m  the  stencil  appear  as  thick  lines 
in  the  magnified  view).  Note  that  the  stencil  mvolves  edges  firom  both  the  upper 
and  lower  substructures,  violating  the  requirement  that  the  two  substructures 
do  not  interact  (the  vertical  stencil  causes  no  problem  in  this  example). 

We  can  solve  this  difficulty  with  a  simple  change.  We  leave  the  partitioning 
of  the  nodes  unchanged,  but  include  as  transition  edges  those  horizontal  edges 
which  are  adjacent  to  a  transition  zone.  Thus,  a  given  transition  zone  in  the 
physical  grid  consists  of  a  row  of  vertical  edges  and  two  rows  of  horizontal 
edges,  as  shown  in  the  left  p<»tion  of  figure  2.14.  With  this  choice  of  transition 
edges,  the  symmetric  matrix  F  has  the  structure  shown  in  the  right  half  of  the 
figure.  There  is  a  large  diagonal  block  associated  with  each  substructure;  in 
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fact,  each  such  block  is  the  Lapladan  acting  on  the  substructure,  and  has  the 
form  described  in  equations  (2.25)  and  (2.26).  Additionally,  there  eue  horizontal 
and  vertical  strips  associated  with  transition  edges.  The  matrix  G,  which  comes 
from  a  gradient  stencil  that  is  a  subset  of  the  Lapladan  stencil  used  for  F  (recall 
§2.3),  has  a  substructured  form  resembling  the  transpose  of  the  equilibrium 
matrix.  We  will  make  use  of  the  substructured  form  of  both  F  and  G  in  one  of 
the  algorithms  we  develop  in  chapter  5. 
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3.  Description  of  the  Algorithms 


In  this  chapter,  we  introduce  four  iterative  algorithms  for  solving  problem 
LSE:  p*cyclic  successive-overrelaxation  (SOR),  block  accelerated  overrelaxation 
(AOR),  a  preconditioned  form  of  the  weighting  method,  and  of  course  algorithm 
BNP.  We  compare  and  contrast  these  algorithms  in  chapter  4. 

Three  of  these  algorithms  (p-cyclic  SOR,  block  AOR,  and  algorithm  BNP) 
iterate  on  a  certam  modified  form  of  the  Kuhn-Tucker  equations;  the  fourth 
(preconditioned  weighting)  u  closely  related  to  this  modified  form.  We  there¬ 
fore  begin  by  constructing  the  modified  Kuhn-Tucker  equations  (§3.1),  and 
include  a  discussion  of  the  computations  necessary  to  achieve  this  form.  We  em¬ 
phasize  the  parallel  aspects  of  the  computation,  describing  a  technique  called 
interlacing  proposed  by  James  and  Plemmons  [20].  We  outline  each  of  the 
four  algorithms  in  the  remaining  sections.  In  §3.2  we  detail  algorithm  BNP  as 
introduced  in  [4].  We  mention  a  source  of  inaccuracy  in  the  original  version 
of  the  algorithm  (discussed  in  mom  detail  in  chapters  5  and  6),  and  propose 
a  simple  correction.  Section  3.3  is  an  mtroduction  to  the  other  three  itera¬ 
tive  algorithms;  we  emphasize  established  results  which  will  prove  important  in 
subsequent  chapters. 
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3.1  Modified  Kohn-Tucker  Equations 


It  is  difficult  to  apply  traditional  iterative  methods  to  the  Kuhn-Tucker  equa¬ 
tions  as  written  in  (2.6):  the  diagonal  blocks  are  not  even  square,  let  alone 

non-singular.  We  therefore  start  by  repartitioning  these  equations.  Since  we 

E 

are  assuming  E  has  full  row  rank,  and  that 


Gx 

G2 


reorder  the  rows  of  G  and  c,  and  repartition 

so  that  the  matrix  defined  by 


has  full  column  rank,  we  can 


(3.1) 


E 

G\ 


(3.2) 


is  square  and  non-singular.  Defining  At  s  Gi  for  convenience,  we  now  have 

(3.3) 


E 

G 


Ax 

Ai 


Make  these  substitutions  in  (2.6)  and  reorder  the  column  blocks  to  obtain  the 
modified  Kuhn-Tucker  equntiona; 


Ax  0  K 
Ai  I  0 

0  AT 


r  ^ 

V 

hx' 

ra 

ss 

Ci 

.  *3, 
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(3.4) 


where  c  s 
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w  4 
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.  *3  » 
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rx 

k  d 

and  K  » 


0  0 
0  I 


Note  that  the  diagonal  blocks  ate  now  square  and  non-singular. 

We  also  observe  that  the  ordinary  (unconstrained)  least  squares  problem 


minimise  ^Gy  -  c{]} 


(3.5) 


may  be  viewed  as  a  '‘constrained*'  problem  with  zero  constraints.  We  can  re¬ 
order  the  rows  of  G  and  c  as  necessary,  and  partition  G  a  J  ^  I  so  that 
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Figure  3.1:  Forming  A\  from  trapezoidal  Et 


is  square  and  non-singulcir.  It  then  becomes  easy  to  write  a  set  of  modified 
Kuha-Tucker  equations  analogous  to  equation  (3.4):  Ai  consists  entirely  of  Gj, 
6i  equals  cj,  and  an  identity  matrix  /  replaces  A'.  In  chapter  4,  it  will  prove 
helpful  to  view  the  unconstrained  problem  vhis  way. 

In  each  of  the  algorithms  we  consider,  we  will  repeatedly  solve  systems  of 
equations  which  have  Ai  and  Aj  as  the  coefficient  matrix.  Thus,  the  key  to 
using  the  modified  Kuim-Tucker  equations  is  producing  an  Ay  whicli  is  easily 
invertible.  The  most  convenient  form  to  seek  is  an  upper  triangular  matrix.  If 
G  has  full  colunui  rank  (a  stronger  assumption  than  112),  one  way  to  accomplish 
this,  depicted  in  ligur:;  3.1,  is  as  follows: 

I.  Use  Gauss  FI  mination  (or  orthogonal  rc(*uctiun)  with  column  pivoting  to 
replace  E  with  the  upper  trapezoidal  matrix 

Et:^\Ei  Er],  (3.6) 

where  El  is  upper  triangular  and  non-singular.  Here  the  subscript  t  iudi- 
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cates  “trapezoidal,”  while  L  and  R  indicate  left  ajid  right  respectively. 


2.  Apply  the  same  column  interchanges  to  G,  and  partition  compatibly  with 
Ef  Use  orthogonal  rotations  on  G  and  c  to  replace  G  with  the  non-singular 
upper  triangular  matrix 


G  = 


G21  G22 
Gi2 


(3.7) 


Here  the  leading  subscripts  indicate  whether  the  blocks  will  become  paxt 
of  Gi  or  G2. 


3.  Define  Gi  =  0  Gj2  j  and  G2  —  |  G21  G\ 

Ai  = 


22 


giving  us 


El  Er 
Gi2 


(3.8) 


While  this  method  of  constructing  Aj  is  useful  for  analysis,  it  may  not  be 
efficient:  column  interchanges  may  destroy  exploitable  structure  in  the  matrices 
E  and  G.  Another  approach,  first  proposed  by  James  and  Plemmons  [20],  is 
to  reduce  E  without  ^umn  pivoting  to  produce  a  “stairstep”  form  E,  (figure 
3.2).  Now  form  Ai  by  interlacing  rows  of  G  with  rows  of  jE,.  A  permutation 
matrix 

P=[Pl  (3.9) 

defines  the  interlacing: 


Ai  =  P 


Es 

Gx 


=  PlE,  +  PrGi, 


(3.10) 


Notice  that  the  same  permutation  matrix  (not  its  transpose)  relates  the  stairstep 
and  trapezoidal  forms: 


e,Pr\^[El  £«)  =  £;.. 


(3.11) 
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When  the  matrices  E  and  G  reflect  a  substructuring  of  the  physical  do¬ 
main,  interlacing  without  column  permutations  is  especially  helpful  for  parallel 
computations.  Given  a  substructured  equilibrium  matrix,  we  can  produce  the 
stairstep  matrix  in  parallel  by  assigning  each  diagonal  block  of  to  a  sep¬ 
arate  processor.  The  only  subtle  issue  is  the  possibility  that  a  given  diagonal 
block  does  not  have  full  row  rank.  To  account  for  this  possibility,  we  produce 
the  matrix  Es  as  follows  (figure  3.3): 

1.  Begin  with  E  in  substructured  form. 

2.  Factor  each  row  block  in  parallel.  Terminate  computations  on  a  row  when 
the  leading  non-zero  in  that  row  is  in  the  transition  block. 

3.  Consider  all  rows  which  now  have  leading  non-zeros  in  the  transition  block. 
Move  them  (implicitly,  using  a  pointer  vector)  to  the  bottom  of  the  matrix. 

4.  Factor  the  transition  block. 
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a.  Begin  with  Substructured  E  b.  Factor  Row  Blocks  in  Paurallel 


c.  Move  ‘^ansition  Rows”  to  Bottom  d.  Factor  ‘^ansition  Rows” 

Figure  3.3:  Forming  Stairstep  E,  from  Substructured  E 

If  column  pivoting  is  needed  or  desired  when  reducing  E,  one  can  preserve 
the  structure  of  E  by  restricting  eligible  pivot  columns  to  those  associated  with 
the  current  diagonal  block  of  the  matrix  (however  one  must  consider  the  effect 
of  such  pivoting  on  both  the  matrix  F  and  the  intermediate  fill  required  to 
generate  G),  In  any  case,  one  can  now  complete  the  construction  of  Ai  by 
interlacing  (figure  3.4)  exactly  as  we  did  above. 

Notice  that  triangular  solves  mvolving  Ai  provide  opportunities  for  block- 
based  parallelisuL  One  first  solves  for  the  unknowns  associated  with  columns 
of  the  transition  block  (i.e.  the  final  block  of  unknowns).  Once  this  is  complete, 
one  can  recover  the  remaining  unknowns  in  parallel  by  assigning  each  diagonal 
block  of  Ax  to  a  separate  processor.  A  similar  algorithm  applies  to  triangular 
solves  with  the  matrix  Af  as  well 

One  triangular  solve  of  each  type  will  occur  in  each  iteration  of  the  algo- 
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Figure  3.4:  Forming  Ai  from  Substructuied  E, 


rithms  we  consider;  in  fact,  these  solves  will  account  for  a  major  percentage  of 
the  execution  time.  Clearly  the  size  of  the  transition  block  is  a  key  limiting  fac¬ 
tor  in  the  parallel  performance  of  these  computations.  Also  important,  however, 
are  the  relative  size  and  structure  the  diagonal  blocks  of  if  these  blocks 
are  widely  varying  size  or  sparsity,  the  parallel  performance  of  the  triangular 
solves  will  suffer. 


3.2  Algorithm  BNP 


The  derivation  of  Algorithm  BNP  as  presented  in  Barlow,  Nichob,  and  Plem- 
mons  [4]  begins  with  the  modified  Kuhn-Thcker  equations 


Ax  0  K 

•  « 

y 

‘6i' 

Ai  I  0 

r, 

S3 

cj 

.0  4  4. 

.  *3  . 

0 

w  .f 

(3.12) 
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where  the  symbols  are  defined  in  equation  (3.4).  The  authors  then  rewrite  this 
system,  applying  block  elimination  to  reduce  it  to  block  upper  triangular  form. 
The  result  of  this  reduction  is  the  system 

'  Ai  0  K  1  r  1  r  ^ 

/  ra  «  ca-AaAr'6i  .  (3.13) 

(4  +  AJAjAr'if )  j  L  *3  J  L  -^(AiAr’6:  -  tj) . 

Ooutinuing  as  in  [4]t  mnltiply  ths  third  block  equation  by  to  obtain  an 
unsymmetric  system  in  the  unknown  x^: 

(I  +  Ar^4Aa>lr‘X)ra  =  AfA^iA^A^^h  -  c,).  (3.14) 


To  simplify  this  system,  temporarily  define  B  =  Af^Aa  AaAj"',  noting  that 
B  is  symmetric  non-negative  definite.  Then,  in  terms  of  B^  the  system  in  23 
has  the  form 

{I^BK)ii^d,  (3.15) 


where  d  temporarily  represents  the  righthand  side  in  (3.14). 


Let 

r  =  5  (3.16) 

be  the  rightmost  block  of  the  matrix  K,  noting  that  R'^R  s  I  and  RR^  as  K. 
Moreover,  dMerve  that  an  arbitrary  matrix-matrix  product  of  the  form  $R  is 
simply  the  rii^t  block  of  5,  while  R^S  produces  the  bwer  block  of  5. 

Now  partition  B  compatibly  with  K  and  Ri 


Given  this  partitioning,  equation  (3.14)  can  be  written  as 


/  Bta 
.0  (/-bBaa) 


(3.18) 


Thus  the  lower  block  in  the  block  upper  triangular  system  (3.13)  is  itself  block 
upper  triangular,  and  from  its  lowest  block  we  obtain  an  equation  in  the  un¬ 
known  ri: 

+  (3.19) 

Bn  is  symmetric  non-negative  definite,  so  (/  -(-  Bn)  is  symmetric  positive 
definite.  If  we  define  the  matrix 

(3.20) 

then  Bn  is  simply  y^,  end  the  system  in  ri  is  the  symmetric  positive  definite 
system 

(/  +  i'^y)ri  =s  h,  where  h  » l’'^(A5>lJ‘'6i  —  cj).  (3.21) 

Algorithm  BNP  solves  this  system  by  the  coii^ugate  gratfient  algorithm, 
wcffldng  with  the  coefficient  matrix  in  factored  form.  For  this  reason,  we  will 
dt(sa  rder  to  (3.21)  simply  as  the  BNF  aystom.  There  are  n~mi  unknowns 
in  the  system,  psori^ilng  a  theoretical  upper  bound  on  the  maximum  number 
of  amjugate  gradient  iteraUo&s  required  for  convergence  Mor«>ver,  as  Barlow, 
Nichob,  and  Plemmcns  observe  in  (4],  Y  may  lack  full  coiumn  rank; 

Rank(/i’) 

ft  “*  ffti  (3.22) 

Eaak(A3) 

min(n,  mi  +  -  n).  (3.23) 

llus^vesus 

IUnk(y)  <  min(n~mi,mi4*m3~n) 

tnin(  #  of  tows  in  Oi,  #  of  tows  m  Gj  ).  (3.24) 


Rank(y)  £ 

ss 

Raak(K)  S 
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If  Y  do«3  m  fact  lack  full  column  rank,  then  zero  is  an  eigenvalue  of  Y^Y 
(with  multiplicity  determined  by  the  rank  defideni^),  and  the  coeffident  matrix 
(J+y^Y)  has  dusteied  dgenvalues  at  unity.  This  further  reduces  the  theoretical 
number  of  conjugate  gradient  iteratioi  i  required  for  convergence. 

If  G  is  square  (as  it  is,  for  example,  in  the  engineering  application),  then 
ma^n,  and  the  dustering  analysis  simplifies  somewhat.  For  mi  <  1,  the 

niMrinwm  number  of  iterations  is  controlled  by  the  column  rank  of  Y,  and  must 
be  less  than  or  equal  to  mi +1,  which  in  turn  is  no  larger  than  |n.  For  mi  >  ^n, 
the  maximum  number  of  iterations  is  bounded  by  the  size  of  the  BNP  system, 
and  must  be  less  than  <»  equal  to  n— mi,  which  in  turn  is  no  greater  than  ^n. 
Thus  the  theoretical  upper  bound  on  the  iterations  is  at  most  ^n,  with  the  worst 
case  occurring  when  mi  s  |n.  (At  first  glance,  there  is  an  apparent  paradox 
here:  given  an  ordinary  least  squares  problem,  which  has  zero  constraints,  the 
analysis  above  suggests  we  should  expect  convergence  in  one  iteration.  But  if 
we  have  an  ordinary  least  squares  problem  with  a  square  G,  then  Ai  ss  G.  We 
ate  presuming  that  systems  which  have  A\  as  the  coeffiduit  matrix  are  easily 
solved.  If  this  were  the  case  for  an  ordinary  least  squares  problem,  we  would 
have  a  trivial  problem,  and  there  would  be  no  need  to  iterate  at  all.) 

Of  course,  our  goal  is  to  find  y,  not  just  r|.  In  ptindple,  we  could  tter^  until 
convergence  to  solve  for  rt,  then  backsolve  through  equations  (3.15)  and  (3.13) 
to  recover  the  solution  y.  In  praedee,  however,  we  have  another  alternative: 
the  analysis  in  [4]  shows  that  a  variation  of  the  eoqjugaie  gradient  algorithm 
applied  to  the  BNP  system  generates  all  that  is  necessary  to  obtain  an  estimate 
for  y  at  each  iter^on. 

To  see  why  this  is  so,  first  note  that  Ks^  as  J^r^.  This  allows  us  to  rewrite  the 
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first  equation  in  the  modified  Kuhn-Tucker  system  (3.12),  producing  a  formula 
relating  y  and  n: 

Aiy  +  5’ri  =  fci.  (3.25) 

Now  apply  the  conjugate  gradient  algorithm  to  the  BNP  system,  using  an  arbi¬ 
trary  guess  of  Let  represent  the  approximation  to  ri  at  the  fcth  iterate, 
and  Tise  equation  (3.25)  to  define  a  corresponding  approximation  to  y: 

(3.26) 


Initialize  consistent  with  this  definition. 

At  each  iteration  of  the  conjugate  gradient  algorithm,  we  will  correct  the 
estimate  of  ri  a  the  standard  way,  tising  a  recursion  of  the  form 


(3.27) 


Here  Sk  is  a  direction  vector,  and  7*  scales  that  vector  to  the  appropriate  length. 
Now  from  equation  (3.26),  we  know  that  ss  AJ" ^(5i  -  Substitute 

(3.27)  into  the  latter  expression,  and  simplify  to  obtain 

y(Ht)  y(k)  ^  where  qu  =>  Ai^Rsk.  (3.28) 

Thus  we  can  update  by  a  multiple  of  the  direction  vector  qk  in  a  raair- 
ner  typical  of  the  conjugate  gradient  method.  The  scale  factor  jj,  is  already 
aviuiable;  we  need  it  to  update  The  vector  qk  also  occurs  naturally  in  the 
conjugate  gradient  iteration,  so  we  can  produce  at  each  iteration  for  a  fairly 
modest  cost  (one  so-called  dense  vector  triad  per  iteration).  Depending  on  the 
number  of  iterations  required  to  achieve  convergence,  this  cost  may  or  may  not 
be  less  than  the  cost  of  the  single  nxn  sparse  triangular  solve  needed  to  deter¬ 
mine  y  once  the  iterations  are  complete.  In  any  case,  the  difference  in  the  two 
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approaches  is  not  likely  to  be  a  significant  portion  of  the  overall  computational 
effort,  and  both  methods  have  performed  with  comparable  accuracy  and  speed 
in  our  experiments. 

There  is  one  other  aspect  of  Algorithm  BNP  which  differs  from  the  tradi¬ 
tional  conjugate  gradient  algorithm.  Let 

i»  =  (;  +  l^K)rS‘>-S  (3.29) 

represent  the  defect  associated  mth  the  BNP  system  (we  itse  the  term  defect 
to  avoid  confusion  with  the  least  squares  residual  r).  Recall  that  the  quantity 
i/jfz/k  is  needed  within  a  conjugate  gradient  iteration  to  compute  various  scale 
fetors;  it  is  also  used  to  test  for  convergence.  If  we  were  to  apply  the  classical 
conjugate  gradient  algorithm  to  the  BNP  system,  we  would  compute  the  defect 
using  an  update  of  the  form 

+  otfcii,  (3.30) 

where  =  (/  +  V^y’)s*  is  a  direction  vector,  a*  is  a  scale  factor,  and  both 
these  quantities  are  needed  elsewhere  in  the  iteration.  Barlow,  Nichols,  and 
Pleinmons  take  a  different  approach,  however.  Define 

(3.31) 

to  be  the  hth  approximation  to  rj.  If  we  initialise  to  the  value  required  by 
its  definition,  we  am  show  by  an  argument  similar  to  the  one  above  that  the 
iterates  satisfy  the  recursion 

(3.32) 
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We  can  now  use  to  compute  the  defect.  Begin  by  substituting  equa¬ 
tion  (3.26)  into  (3.29)  to  find  that 

=  (3.33) 

Then  use  the  definition  of  given  in  (3.31)  to  obtain 

=  (3.34) 

This  method  of  computing  the  defect  requires  the  same  computational  effort  as 
the  standard  method  of  explicitly  using  a  scale  factor  and  direction  vector. 

We  summarize  the  complete  algorithm  in  table  3.1.  This  particular  outline 
obscures  opportimities  to  avoid  redundant  calculations,  but  will  prove  useful  for 
the  analysis  in  chapter  4.  See  the  original  paper  [4]  for  a  version  suitable  for 
implementation. 

Our  expeiimenta  suggest,  however,  that  equation  (3.34)  is  somewhat  un¬ 
stable:  we  obtained  superior  results  with  a  conventional  update  of  the  defect, 
especially  on  large  problems  (the  BNP  method  of  updating  the  vector  y  appears 
to  present  no  difiBculties).  Apparently  the  defect  gradually  drifts  firom  the  cor¬ 
rect  value,  causing  the  direction  vectors  and  scale  factors  to  become  inaccurate, 
and  resulting  in  slower  convergence.  Barlow,  in  a  private  communication  [3], 
hes  observed  that  this  is  most  likely  because  the  independent  updates  of  ri, 
ra,  and  y  do  not  enforce  the  relatiotiships  which  must  exist  among  these  three 
quantities.  Fortunately,  the  problem  is  quite  easy  to  correct  (see  chapters  5  and 
6):  a  traditional  update  of  the  defect  improves  performance,  while  preserving 
the  spirit  of  the  BNP  algorithm. 

We  remark  in  passing  that  algorithm  BNP  has  one  other  interesting  prop¬ 
erty.  The  quantity  is  defined  to  be  equal  to  ca  -  Additionally,  a 


44 


Table  3.1:  Algorithm  BNP  (Problem  LSE) 


1.  Initialize: 

(a)  arbitrary  (normally 

(b)  —  Aiy(®^)  =  ci  —  Giy^®)  (normally 

(c)  =  cj  —  Ajy^®)  =  cj  —  Gjy^®^ 

(d)  I/O  =  r{®^  +  {vk  is  the  defect  ( J  + )rl''>  - 

(c)  ao  =  Vo  (sk  is  the  direction  vector) 

2.  For  h  s  0, 1,. . . ,  until  v^Vk  <  tolerance: 

(a)  7*  =  +  Y^Y)sk 

(b)  -  'fkSk 

(c)  y<*+^)  =  y W  +  'ykA.i^Ksk 

(d)  r5*+'>=:rf -7fcl'afc 

(e)  !/*+!= 

(f)  ^k+\  =  vl^^uk^xfvlvk 
iz)  ajt+l  =  I/*+l  +  fik-kl^k 


simple  induction  argument  confirms  that  the  vector  r^\  the  BNP  xmknown, 
satisfies  a  similar  natural  relationship:  =  ci  -  This  means  that  the 


vector 


l4‘' 


is  p^ecisely  the  residual  =  c  —  associated  with  the  l:th 
approximation  to  the  solution  vector  y.  Of  the  four  basic  iterative  algorithms 
we  will  consider,  algorithm  BNP  is  the  only  one  with  this  satisfying  property. 


3.3  Other  Iterative  Algorithms 

There  are,  of  course,  other  established  iterative  algorithms  for  solving  problem 
LSE.  Two  such  methods,  p-cyclic  SOR  and  block  AOR,  are  parameter-based  lin¬ 
ear  stationary  methods  applied  to  the  modified  Kuhn-Tucker  equations  (3.12). 
The  third,  a  preconditioned  form  of  the  weighting  algorithm,  is  closely  related 
to  the  modified  Kuhn-Tucker  formulation.  We  introduce  all  three  algorithms 
here,  with  an  eye  toward  comparing  and  contrasting  these  methods  with  BNP 
in  chapter  4. 

Given  a  non-singular  linear  system  Cz  =  /,  let  C  =  D  ~  i — 1/  define  a  split¬ 
ting  of  the  coefficient  matrix  into  block  diagonal,  lower  triangular,  and  upper 
triangular  parts  respectively.  The  well-known  block  SOR  algorithm  (see  Young 
[38],  and  Hageman  and  Young  [14])  is  then  defined  by  an  iteration  involving  a 
relaxation  parameter  u'. 

{D  -  wX)^(‘+‘>  =  [(1  -  u)D  +  uiU]  z(‘‘>  -b  (jf.  (3.35) 

If  we  consider  the  modified  Kuhn-Tucker  system  (3.12),  two  plausible  choices 
of  the  block  diagonal  matrix  D  produce  three-  and  two-block  SOR  methods 
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respectively: 


JD3- 

\  Ax 

I 

and  Di  = 

Ai 

Aj  / 

L 

[  J 

While  little  is  known  about  the  optimal  iteration  parameter  u  for  arbitrary 
linear  systems,  the  coefficient  matrix  in  the  modified  Kuhn-Tucker  system  enjoys 
some  special  properties  which  make  a  more  complete  emalysis  possible.  When 
the  coefficient  matrix  of  the  modified  Kuhn-Tucker  system  is  partitioned  using 
either  D3  or  D],  it  is  a  so-called  p-cyclic  matrix,  and  the  associated  SOR  algo¬ 
rithms  are  known  as  p-cyclic  SOR  methods.  An  elegant  theory,  due  largely 
to  Young  [39]  and  Varga  [36],  [37],  relates  the  spectnim  of  the  SOR  iteration 
matrix  to  that  of  the  corresponding  Jacobi  iteration  matrix.  This  p-cydic  the¬ 
ory,  combined  with  ^>ecial  properties  of  the  Jacobi  iteration  matrices  associated 
with  Di  and  Di,  leads  to  a  number  of  important  results  for  p-cycIic  SOR  applied 
to  problem  LSE.  In  particular,  Plemmons  [30]  has  established  that  2-cydic  SOR 
^pU^  to  LSE  converges  for  sufficiently  small  values  of  (there  may  or  may 
not  be  values  of  a;  for  which  3-cyclic  SOR  converges).  Additionally,  Markham, 
Nemnann,  and  Plemmons  [24]  have  shown  that  the  asymptotic  convergence  of 
optimal  2-cyclic  SOR  is  superior  to  3-cycUc  SOR.  Pierce,  Hadjidimos,  and  Plem- 
moos  [29],  and  Eiermann,  Niethammer,  and  Ruttan  [10],  e3Ud>lish  results  for 
more  general  problems,  demonstrating  the  importance  of  the  special  properties 
of  the  modified  Kuhn-Tucker  system. 

Another  approach  to  solving  the  modified  Kuhn-Tucker  equations  is  a  two 
parameter  generalisation  of  SOR  known  as  Accelerated  Over-relaxation  or 
AOR  (see,  for  example,  Hadjidimos  [13]): 

(D  -  *  ((1  -  w)Z?  +  (w  -  0)L-^-u;U]  -!-«/.  (3.3?) 
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Note  that  the  special  case  a;  =  is  block  SOR. 

Again  we  consider  the  blockings  given  by  (3.36),  referring  to  the  correspond¬ 
ing  algorithms  as  3-AOR  and  2-AOR  respectively.  The  optimal  choice  of  pa¬ 
rameters  for  2-AOR  occurs  somewhere  on  the  line  u  =  0  (see  Papadopoulou, 
Saridakis,  and  Papatheodorou  [28]),  so  optimal  2-AOR  coincides  with  optimal 
2-cyclic  SOR.  Optimal  3-AOR,  however,  does  not  necessarily  occur  when  u  = 

In  fact,  Papadopoulou  et  al,  establish  in  [28]  that  under  some  very  restrictive 
(and  highly  technical)  conditions,  the  asymptotic  convergence  of  3-AOR  may 
be  better  than  optimal  2-AOR  (and  therefore  2-cyclic  SOR). 

Table  3.2  is  an  explicit  description  of  the  3-AOR  algorithm  applied  to  prob¬ 
lem  LSE.  Remember  that  the  matrices  K  and  R  are  defined  in  equations  (3.12) 
and  (3.21).  Technically,  we  could  initialize  and  to  arbitrary  values.  As 
defined  in  the  table,  however,  the  initial  values  axe  both  mathematically  plau¬ 
sible  and  consistent  with  the  BNP  iteration  (unlike  BNP,  however,  subsequent 
iterates  do  not  satisfy  =*  C|  —  or  =  ca  -•  Aay^*^).  In  any  case, 
our  experiments  suggest  the  method  u  not  particularly  sensitive  to  the  choice 
of  initial  iterates. 

We  can  solve  problem  LSE  in  yet  another  way,  this  time  by  viewing  the 
problem  in  an  entirely  different  manner.  Let  r  be  a  large  positive  constant,  and 
consider  the  ordinary  least  squares  problem 


minimize  [jWy  -  6|ja, 


where  W  and  S  are  given  by 


’  tE  ' 

a 

rb 

G 

»  « 

,  6  = 

c 

»  « 

(3.38) 


(3.39) 
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Table  3.2:  Tbree-Block  AOR  (Problem  LSE) 


1.  Initialize: 

(^) 

y(®)  arbitrary 

(b) 

rS'’'  = 

ca  —  Aay^®l 

(c) 

rpU 

Cl  -  Giy(®l 

(d) 

A(®)  arbitrary 

(e) 

r  At")  1 

1  J 

2.  For 

i;  a  0, 1,. . . ,  until  convergence: 

(») 

-  -  «4‘’)  -  -  if'-l*’) 

(b) 

rS***’ 

a  ca  -  Aa  [(1  - 

(c) 

(d) 

as  (1 

(e) 

-j- 

+ 

1 

T 

(f) 

4*^1) 
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This  is  a  special  case  of  a  so'called  weighted  least  squares  problem,  or  WLS 
(see,  for  example,  Bjorck  [7],  or  Golub  and  van  Loan  [12]). 

Note  that 

IIH'K  -  m  =  r»||£»  -  6|||  +  lIGs)  -  41  (3.40) 

This  means  that  any  candidate  solution  y  which  fails  to  satisfy  Ey  will  cause 
the  quantity  \\Wy  ~  SlU  to  be  exceptionally  large.  Thtis,  r  acts  as  a  penalty 
parameter  forcing  y  to  come  very  close  to  satisfying  the  equilibrium  constraint, 
and  we  might  expect  that  the  solution  to  this  weighted  least  squares  problem 
is,  to  a  good  q>proximation,  the  solution  to  problem  LSE.  Given  a  reasonably 
conditioned  problem  and  a  well  chosen  value  of  r,  the  analysis  in  Barlow  [2] 
suggests  that  this  is  in  fact  the  case. 

This  ^proach  to  solving  problem  LSE,  known  as  the  weighting  method, 
has  a  fairly  long  history  (see  Bjorck  [7]).  Most  algorithms  based  on  this  method 
solve  the  weighted  least  squares  problem  (3.38)  by  some  direct  method  (e.g. 
an  orthogonal  factorization).  Here,  however,  we  consider  an  iterative  approach 
based  on  the  factored  form  of  the  normal  equations 

(3.41) 


Anticipating  the  likelihood  th^  this  system  is  poorly  conditioned,  we  seek  to 
precondition  the  problem. 

To  construct  a  preconditioner,  hist  note  that  the  matrix  W  as  defined  above 

r  £?  1 

is  nothing  more  than  a  scaled  version  of  the  matrix  q  which  appears  in  the 

»  4 

Kuhn>Tucker  equations  (2.6).  Thus,  we  can  reorder  and  repartition  it  exactly 
as  in  §3.1.  in  particular,  reorder  the  rows  of  G  and  c,  and  partition 


(?« 


G\ 

Cfa 


(3.42) 


SO 


so  that  the  matrix  defined  by 


tE 

Gi 


(3.43) 


is  square  and  non-singular.  Define  W2  ==  we  now  have 


W  = 


Wi 

W, 


(3.44) 


Given  this  partitioning  of  we  can  rewrite  the  normal  equations  as 


{W^Wx  +  W^Wi)y  =  WH. 


(3.45) 


Now  precondition  with  Wi  in  the  standard  way: 


Wx^W^Wi  +  =  W{"^Wn, 


(3.46) 


or 


(/  +  C**C)w«W'fn4^S,  whereCaW^fS  waWiy.  (3.47) 

We  can  now  ^)ply  the  conjugate  gradient  algorithm  to  this  system,  leaving  the 
co^cient  matrix  in  factored  fcMcm.  We  refer  to  this  method  of  solving  problem 
LSE  as  the  preconditioned  weighting  method,  or  Pwgt. 

Superficially,  the  system  in  (3.47)  appears  to  mimic  the  BNP  system  (3.21). 
Remember,  however,  that  this  is  not  a  reduced  order  problem:  the  size  of  the 
system  remains  nxn.  Still,  there  is  a  relationship  between  BNP  and  the  precon¬ 
ditioned  weighting  method.  We  will  make  the  connection  explicit  in  the  next 
chapter. 
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4.  Comparison  of  the  Algorithms 


In  chapter  3,  we  introduced  algorithm  BNP  as  it  j^pears  in  [4],  and  outlined 
three  other  iterative  algorithms  for  solving  problem  LSE:  p-cyclic  SOE,  block 
AOR,  and  a  preconditioned  form  of  the  weighting  method.  We  are  now  inter¬ 
ested  in  comparing  algorithm  BNP  to  each  of  the  other  three  methods. 

The  relationship  between  BNP  and  p-cyclic  SOR  is  already  well  under¬ 
stood:  despite  the  elegant  convergence  theory  for  p-cyclic  SOR,  BNP  is  su¬ 
perior  in  exact  arithmetic.  This  was  established  by  Freund  [11]  for  uncon¬ 
strained  least  squares  problems,  and  by  Barlow,  Nichob  and  Plemmons  [4]  for 
equality  omstrained  problems.  More  precisely,  if  y(°),  rf  ^  ss  cj  *“  and 

«  Cl  -  (ny(^)  serve  as  initial  iter^es  for  both  BNP  and  2-cyclic  SOR,  then 
the  iterates  at  suUequent  steps  satisfy  the  inequality 

l|e-Cvg'U,Sl|c-C»g5i’ll,-  (4.1) 

While  the  authors  state  their  result  explidtly  only  for  2-cycIic  SOR,  it  holds  for 
3-cyclic  SOR  as  weU.  This  is  a  corollary  to  the  main  result  in  §4.2,  but  it  is  no 
surprise:  we  have  already  mentioned  (see  [24])  that  the  asymptotic  convergence 
of  2-cyclic  SOR  is  better  than  the  3-cyclic  approach. 

In  thb  chapter,  we  consider  the  relatiooship  cf  algoritbm  BNP  to  the  other 
two  types  of  iterative  methods  outlined  in  chapter  3.  Adap  ting  the  arguments  of 
both  Freund  and  the  authors  of  BNP,  we  extend  their  work  to  show  that  BNP  b 
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also  superior  to  block  AOR.  We  prove  this  for  unconstrained  problems  in  §4.1, 
and  for  constrained  problems  in  §4.2.  Then,  in  §4.3  we  show  that  algorithm  BNP 
may  be  viewed  as  the  limiting  case  of  the  preconditioned  weighting  method. 

Note  that  all  the  methods  require  the  same  pre-iteration  processing  (fac¬ 
toring  E  and  and  forming  the  preconditioner  by  augmenting  the  factored 
equilibrium  matrix)  and  essentially  the  same  computational  effort  per  iteration 
(two  triangular  solves  and  two  matrix-vector  products  dominate  the  computar 
tional  effort).  Thus,  comparing  the  total  number  of  iterations  needed  to  achieve 
convergence  is  a  meaningful  way  to  compare  the  relative  performance  of  these 
methods. 


4,1  BNP  vs  AOR:  Ordinary  Least  Squares 

Our  goal  is  to  establish  that  BNP  applied  to  problem  LSE  is  superior  to  block 
AOR  in  exact  arithmetic,  by  proving  a  result  analogous  to  equation  (4.1).  Since 
optimal  2-AOR  cmnddes  with  <^timal  2-cyclic  SOR,  it  suffices  to  consider 
3- AOR  The  pro<ff  follows  the  spirit  the  arguments  in  Freund  [11]  and  Barlow, 
Nichols,  and  Plemmons  [4],  and  proceeds  in  three  steps: 

1.  Given  an  ordinary  least  squares  problem  (no  constraints),  we  consider 
algorithm  BNP  applied  to  the  prcbiem,  and  examine  the  properties  of  the 
iterates. 

2.  Given  the  sune  ordinary  least  squares  problem  cemsidered  in  step  1,  we 
study  the  properties  of  3-AOR  applied  to  the  problem,  and  establish  that 
algorithm  BNP  b  superior  to  3-AOR  for  thb  unconstrained  problem. 
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3.  Given  an  equality  constrained  least  squares  problem,  we  construct  a  re¬ 
lated  ordinary  least  squares  problem.  We  then  establish  a  relationship  be¬ 
tween  the  iterates  generated  by  BNP  2q)plied  to  the  constrained  problem 
and  the  iterates  generated  by  BNP  applied  to  the  unconstrained  problem. 
We  do  the  same  for  3- AOR  as  well.  Then  the  results  of  step  2  allow  us  to 
conclude  that  BNP  is  superior  to  3-AOR  for  problem  LSE. 

We  complete  steps  1  and  2  in  this  section.  In  the  next  section,  we  relate  the 
constrained  problem  to  an  ordinary  least  squares  problem  to  complete  step  3. 
Tc>  prevent  confusion,  we  use  a  tUde  (’)  over  many  of  the  quantities  associated 
with  the  unconstrained  problem  of  thb  section. 

Begin  with  the  ordinary  least  squares  problem 

minimise  QAx  -  S}]},  (4.2) 

where  ^  has  full  column  rank.  View  this  problem  as  a  ^constrained”  least 
squares  problem  with  zero  constraints.  The  choice  of  z  as  the  unknown  will 
prove  convenient  in  the  ne.xt  section. 

By  analogy*  with  (S\2),  reorder  the  rows  of  X  and  S  as  necessary,  and  partition 
the  coefficient  matrix 

so  that  Ax  is  square  and  non-singular.  Partition  the  vector 


compatibly.  The  resulting  modified  Kuhn-Tucker  equations  are  similar  to  those 
in  (3.4),  except  the  identity  matrix  replaces  the  matrix  A',  and  there  is  uo 
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Lagrange  multiplier: 


‘ii  0  /  ■ 

’  X 

\k] 

/  0 

fi 

= 

k 

0  4  if  J 

.  h  . 

.  0  . 

Note  that  BNP  and  3-AOR  as  outlined  in  §3.2  and  §3.3  are  perfectly  well 
defined  for  this  system.  To  obt^  the  unconstrained  equivalent  of  the  BNP 
system  (3.21),  simply  replace  K  with  the  idoitity  matrix  /,  and  substitute  Ai, 
»'3»  *  for  Ai,  Aj,  hi,  ca,  ri,  rj,  and  y  t«pectively.  In  particular, 

by  analogy  with  equation  (3.20),  note  that 

i^«iaAr'.  (4.6) 

The  BNP  system  f(»  the  unconstrained  problem  can  then  be  written 

(/  +  ?^?)h  =  k  where  K  =  F(fSi  -  hj).  (4.T) 

Now  apply  the  ocmjugate  gradient  algorithm  as  9ven  in  §3.2.  For  convenience, 
we  describe  the  algorithm  explicitly  in  table  4.1. 

The  AOR  algorithm  is  just  as  simple  to  modify,  but  the  absence  of  changes 
its  appearance  somewhat.  See  table  4.2  for  an  explicit  description  of  the  algo¬ 
rithm. 

We  are  now  prepared  to  proceed  with  the  proof.  In  the  discussion  below,  we 
use  the  following  not^ion:  if  tu  b  a  vector,  and  5  a  set  of  vectors,  then  to  4*  5 
lepreseats  the  set  {w  -f  s  :  s  €  5).  Similarly,  if  B  b  a  then  BS  is  the 

set  {Bs  :  $  €  5}.  If  5  b  convex,  so  are  t2)  +  5  and  BS\  if  5  b  a  vector  subspace, 
SS  b  a  subspace  as  welL 
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Table  4.1:  Algorithm  BNP  (Unconstrained  Least  Squares) 


1.  Initialize: 

(a) 

arbitrary  (normally  ^ 

5.) 

(b) 

fi»>  = 

=  Si  -  Aix(®l  (normally  = 

0) 

(c) 

ff  = 

=  Sj  -  Aax^®! 

(d) 

i5b  = 

fW  4.  (j/j^  is  the  defect 

(e) 

5o  = 

A}  (J*  is  the  direction  vector) 

2.  For  i 

fc  =  0 

,  If.  • .  I  until  i>f  A*  <  tolerance: 

(a) 

II 

i>^hlSHl  +  f''Y)h 

(b) 

(c) 

*(*+0  -  j;(*)  4. 

(d) 

(«) 

(0 

0k+\ 

=  t>I+ih+ilv^i>k 

(g) 

h+i 

=  A*+1  +  j3*+iS* 

Table  4.2:  Three-Block  AOR  (Uaconatrained  Least  Squares) 


1.  Initialize: 

(a)  arbitrary 

(b)  i^“> = k 

(c)  f =  k 

-iixO) 

2.  For  k  =s  0,lr 

. . ,  until  convergence: 

(a)  SB 

Ar’(^  -  H‘>) 

(b)  4^“^^  « 

5a  —  via  |(l 

(c)  = 

(d)  *(*+*)  = 

(1  “  w)x^*)  + 

{«)  fi**'’  = 

(f)  = 

(1  +ufS***“ 

Let  xW,  ff  ^  s=  Si  -  and  =  S2  -  define  a  single  set  of  initial 

iterates  for  both  BNP  and  3-AOR.  Define  the  following  vectors  in 

VQ  =  S1+FS3  (4.8) 

«o  =  t;o-(Ai +y^A2)x^°^  (4.9) 


too  —  Af'tio 

=  A:[\k  +  Y'^k) 

too  =  Ai^t^ 

=  too  -  Aj '(Ai  +  y^A3)x(°^. 

Additionally,  let 


^  =  Ar^f^fAi«Ar^y^Aj, 

noting  that 


too  *  too  -”  (/  + 

Finally,  define  a  sequence  of  Krylov  subspaces: 


(4.10) 


(4.11) 


(4.12) 


(4.13) 


So  =  {0} 

Sh  =  span  |too,  Btoo, . . . , -B^'^too} .  (4.14) 

Lemma  4,1  (Freund)  Lei  x^®!,  4°^  =  S3  -  Asx^®^  and  =  Si  -  Aix(°)  be 
initial  iterates  for  dgorithm  BNP  applied  to  the  ordinary  least  squares  problem 
(4>8).  Then  the  kth  iterate  lies  in  the  setx^^^+Sk.  Moreover,  x^*^  minimizes 
the  residual  HfUs  =  jjS-  Ax||3  over  all  x  in  x(®l  +  St,, 


Proof.  See  [11].  0 
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Theorem  4.1  Let  =  Sj  -  and  s=s  he  initial 

iterates  for  both  BNP  and  S~AOR  applied  to  the  ordinary  least  squares  problem 
(4'^)'  het  Xshp  ^AOR  the  kth  iterates  generated  by  the  two  algo- 

rithms  respectively^  and  define  ij^ffp  =  5  —  ^bnp  =  ^  *“  ^aor  to 

the  associated  residuals.  Then  Ute  iterates  satisfy  the  inequality 

<  ll^AoA^lla 

in  exact  arithmetic,  regardless  of  the  AOR  iteration  parameters. 

Proof.  By  Lemma  4.1,  it  suffices  to  prove  that  the  AOR  iterate  x^oA* 

+  Sk.  Suppressing  the  “AOR”  subscripts  for  simplicity,  we  accomplish  this 
by  establishing  the  following  relationships: 

(a)  x<*l  €  x^®^  +  Sh~.x 

(b)  i2-f?’*^€A2(x(®)  +  5*.i) 

(c)  5,  -  €  AjCxW  +  Sk-x) 

(d)  lx  +  €  VO  -  y^Aa(x(®)  +  5*-x) 

(e)  k  +  €  VO  -  y^A,(x(°)  +  St-i) 

(f)  5i  ~  €  Vo  -  Y^Aiix^^^  +  Sk-x) 

(g)  lx  -  €  Axx<®l  +  span  {vo}  + 

(h)  xM)€x(®)  +  5fc 

Our  goal  is  to  establish  the  first  of  these  relationships;  the  others  axe  means 
toward  that  end.  We  argue  by  induction.  Listing  the  early  iterates  explicitly. 
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we  have: 


a;(i) 

^2 


f(i) 

“2 


*^2 


-(0) 

rj '  -  ujvo 


-p(i+i)  _  3.(0)  ^ 

Use  these  values  to  confinn  that  each  of  the  inductive  hypotheses  holds  for 
kssl.  Now  assume  all  eight  hypotheses  hold  for  a  fixed  k,  and  consider  A  +  1. 

Proof  of  (a):  Smce  +  Sk-i  is  contained  in  +  Sk,  inductive  assumption 
(a)  tells  us  that  is  an  element  of  +  5*.  By  assumption  (h), 
is  also  in  this  set.  Thus,  by  convexity,  x(*+^)  s=  (1  is  in 

®(°)  +  5*  as  required. 

Proof  of  (b):  Equation  2(b)  of  table  4.2  tells  us  that 

By  assumption  (a),  we  know  Aaa:^*)  is  contained  in  +  Sk-i),  which 
in  turn  is  a  subset  of  +  -S’*).  Moreover,  i42x(*+i)  is  in  the  latter 

set  by  inductive  assumption  (h).  Thus,  by  convexity,  ^2  ~  is 
+  5*)  as  required. 


60 


Proof  of  (c):  Prom  assumption  (c)  and  the  containment  argument  used  above, 
we  have  that  Sj  —  is  in  +  5*).  The  proof  of  (b)  tells  us  that 

Sj  —  is  in  the  latter  set  as  well.  Rearrange  equation  2(e)  of  table  4.2 
to  see  that  Sj  —  is  a  convex  combination  of  Jj  —  and 
and  so  is  in  +  5*)  as  required. 

Proof  of  (d):  From  the  definition  of  Vo  hi  equation  (4.8),  and  the  definition  of 
in  table  4.2,  we  find  that 

Si  +  =  t)o  -  [(1  -  ^)x(*)  +  . 

Assumptions  (a)  and  (h)  combined  with  convexity  then  give  us  the  re> 
quired  result. 

Proof  of  (t)  and  (f):  Similar  to  the  proof  of  (c). 

Proof  of  (g):  By  assumption  (g),  there  is  a  scalar  a,  and  a  vector  zi  in 
such  that 

Si  —  +  ovo  4*  «i. 

By  the  proof  of  (f),  there  is  a  xa  6  Y^AiSh  such  that 

Si  -  =  tio  -  Y^Atx^^^  +  X3. 

Let  X  3  (1  ~u;)xi  -hu/xa,  and  note  that  x  €  V^Aa^s.  Rearrange  equation 
2(f)  of  table  4.2  to  obtain 

Si  -  3  (1  -  u;)(Si  -  f[*^)  +  w(Si  -  f[*'^^^). 

Remembering  that  wo  3  60  —  (Ai  +  y^A3)xt°),  use  the  expressions  for 
Si  —  f[*^  and  Si  —  to  find  that 

Si  —  3  AiX^®^  +  Ck(1  —  w)Vo  +  U/t>o  "I"  X, 
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which  is  in  +  span  {t;o}  +  y^AiSk  as  required. 

Proof  of  (h):  From  table  4.2  we  have  By  the  proof 

of  (g),  there  exists  a  scalar  7  such  that 

li  -  +  7t;o  +  Y'^A^Sk- 

Recalling  that  two  =  these  two  facts  tell  us  that  is  in 

the  set  +  7two  +  But  two  is  an  element  of  Sk+h  and 

Ai^y^AiSk  =  BSk,  which  is  a  vector  subspace  of  Sfc+j.  Hence 
is  in  x^®^  +  Sk^i  as  required.  □ 

4.2  BNP  vs  AOR:  Constrained  Least  Squares 

In  the  previous  section,  we  proved  that  in  exact  arithmetic,  BNP  applied  to  an 
unconstrained  least  squares  problem  converges  at  least  as  fast  as  3<A0R.  To 
extend  this  result  to  constrained  least  squares  problems,  we  establish  a  connect 
tion  between  problem  LSE  and  a  specially  constructed  unconstrained  problem. 
The  argument  below  is  an  adaptation  of  Th«)rem  2.1  in  Barlow,  Nichols,  and 
PlemmoDS  [4],  and  is  based  on  a  special  case  of  the  classical  nullspace  method 
(see  §5.1), 

Consider  problem  LSE  as  given  in  equation  (2.1),  and  the  associated  modi¬ 
fied  Kuhn-Tucker  equations  (3.4).  Define  the  matrix 

N^A;'J(:^A;'  J  .  (4.15) 

»  4 

Observe  that 

^  Al'  =  =  Q  j  •  (4. 16) 

.  *  J  L  . 
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Thia  means  ~  0  ]  >  =  0*  Moreover,  N  has  n— mi  linearly 

independent  columns,  so  the  columns  of  N  form  a  basis  for  the  nuUspace  of  E. 
By  a  similar  argument,  the  vector 

=  (4.17) 


is  a  particular  solution  of  the  constraint  Ey  —  b.  Note  that  any  vector  satisfying 
the  constraint  can  be  written  as  y  s  y^-^Nx  for  some  choice  of  £,  and  minimizing 
II  Gy  —  cjja  subject  to  the  constraint  amounts  to  minimizing  |lG(yp  +  Nx)  -  cjja 
over  all  possible  x.  The  latter  minimization  is  an  unconstrained  problem  of  order 
n— mt«  Isolating  the  unknown  x,  we  have  the  ordinary  least  squares  problem 

minimize  ((Ax  — 1{\%  where  A  *3  GiV,  S  ==  c  -  Gy,.  (4.18) 


Referring  to  equation  (4.16),  we  find  that 

/  ‘ 

K  J’ 

where  Y  =>  A^Ai^IC  is  as  in  the  BNP  system  (3.21).  Additionally, 


ON 


GxN 

GiN 

(4,19) 


Giy,  =  GiAi^bi  S3 


6 

Cl 


=sct, 


(4.20) 


so  Cl  ~  Giy,  ss  0.  All  of  these  relationships  give  us  an  elegant  form  for  the  mod¬ 
ified  Kuhn-Tudcer  system  associated  with  the  ordinary  least  squares  problem 
pven  in  (4.18): 


'  I  Q  I' 

•  • 

X 

0 

r  0  0 

0  I 

ss 

ca  -  Gay, 

0 

m  at 

(4.21) 
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In  terms  of  the  notation  &om  the  previous  section,  we  have 


il  =  J 

(4.22) 

II 

ea 

(4.23) 

o 

II 

(4.24) 

Sj  s=  Ca  —  AaAj'^fti. 

(4.25) 

Now  let  yl®l  be  an  arbitrary  initial  iterate  for  both  BNP  and  3-AOR  applied 

to  problem  LSE.  There  is  a  unique  x^®)  such  that  yl®l 

correct  value  is  given  by 

=  yp  +  iVx^°l;  in  fact,  the 

x(®)  =  ^Viy‘®)-6i). 

(4.26) 

Use  this  value  of  as  the  initial  iterate  for  BNP  and  S^AOR  applied  to  the 
ordinary  least  squares  problem  (4J8).  We  now  define  the  corresponding  initial 
residual  iterates,  and  relate  the  subsequent  iterates  for  the  constrained  problem 
to  those  of  the  unconstrained  problem. 

Lemma  4.2  Let  =  ca~  ond  s  ci  6«  initial  iietates 

for  BNP  applied  to  problem  LSE.  Define  os  in  cquotion  (4^S6),  and  let 
fj|®^  =3  Sj  ~  AaX^®^  and  =  Sj  —  Aixl®^  be  initial  iterates  for  BNP  applied  to 
the  related  ordinary  least  squares  problem  defined  in  equations  (4^18)  through 
(4>85).  Then  suiscqucnt  iterates  s^isfy  =>  y^  +  Nx^^K 

Proof  Refer  to  tables  3.1  and  4.1.  Remembering  that  a  tilde  ('')  represents 
a  quantity  associated  with  the  ordinary  least  squares  prr^lem,  we  prove  by 
induction  that  yl*^  =  y>  +  Nx^^\  r^^  =  sa  rjl‘\  Vk  =  h,  and  Sk  =  Si 

for  all  k. 
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First  consider  it  =  0.  We  know  yW  =  by  tbe  definition  of 

Since  Ai=  J  and  ii  =  0,  we  have  fi  —  which  is  —  6i).  But 

K^Ai  =  G\  and  K^bi  =  ci,  sc  =  cx  —  G\y^°\  which  is  as  required. 
Additionally,  simple  substitution  tells  us  that  is  cj  —  A^ly^  +  IV® This 
is  —  A^y^^\  which  is  as  required.  Finally,  note  that  the  initial  defects  and 
initial  direction  vectors  satisfy  ^  and  so  ~  so  trivially. 

Now  aastune  y(*l  =  yp  +  Nx^^\  5=  =  f[^\  ujt  —  ihtt  a*  s=  J* 

are  all  true  for  a  fixed  k,  and  consider  k+l.  When  BNP  is  applied  to  the 
ordinary  least  squares  problem,  we  have  Y  =  AsAJ"'  =  Y.  This,  combined  with 
the  inductive  assumptions,  ^ves  us  7*  a  immediately,  horn  which  we  get 
a  a  *^*+1  “  We  can  then  conclude  that 

=  ^Hit  which  we  obtain  sji^i  =  shi  as  required. 

Finally,  we  know  »  y^*^  4*  7* Af' ifsi.  Recalling  that  Ax  =  /,  we  have 
Al^R  a  Ai^N.  Moreover,  s^  a  J*  and  yt‘J  »  Pp  4*  by  hypothesis.  So 
y(Hi)  S3  y^  4.  +  %A{^h)%  which  b  y,  +  as  required.  □ 

Lemma  4.3  Let  *  ca  -  Ajy^®^  =  cx  -  Gxy^®^  ond  be  initial 

iUmtei  for  3-AOR  applied  to  problem  LSE.  Define  ®i®)  os  in  equ<Uion  (4-26), 
and  let  ®t®),  33  Jj  --  Aai^®^  and  f[®^  a  &x  —  AxX^®)  be  initial  iterates  for 

S^AOR  (xoith  the  same  choice  ofu  and  0)  applied  to  the  related  ordinary  least 
squares  problem  defined  by  efoations  (4-lB)  through  (4-SS).  Then  subsequent 
iterates  satisfy  y^**^  =*  Vp  +  iV®W. 

Proof.  Refer  to  the  description  of  the  alg<»ithms  given  in  tables  3.2  and  4.2. 
We  estal^h  by  induction  that  yi*)  =  yp  +  Nxl*\  ss  and  a  f[*\ 
The  case  h  sa  0  is  in  the  proof  of  Lemma  4.2. 
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Now  asstime  =  yp-^Nx^*‘\  =  f^\  and  aie  true  for  a  fixed 

k,  and  consider  i  + 1.  Since  A\  I  and  5  =s  0,  we  know  that 
for  all  k.  By  the  inductive  assumptions,  this  means  =  — r[*\  Af¬ 
ter  this  substitution,  the  defining  equation  y^*+il  =  becomes 

y(*+i)  =  ^  We  Alan  know  that  y^*l  ss  y^  +  iVx^*J  by  inductive  as¬ 
sumption.  Thus,  we  find  that  the  equation  y{*+'l  —  (1  —  be¬ 

comes 

y(*-H)  —  y^  ^  jV  _  u>)x^*^  -hu/® 
which  is  y,  -f  as  required. 

We  obtain  by  a  straightforward  substitution.  Observing  that 

=:  a  simple  substitution  produces  =  rj*'*’*^  as  well.  □ 

Lemma  4.4  Lety^®l,  as  C3- Ajy^®!,  ond  r[®^  =  Cj  -Giy^®^  he  initial  uerotes 
for  eUher  BNP  or  S-AOR  applied  to  problem  LSE  (the  choice  of  in  S-AOR 
is  immaterial).  Define  as  in  equation  (4‘S6),  and  let  =s 

and  f[®^  «  St  —  ^tx^®l  he  initial  iterates  for  the  same  algorithm  applied  to  the 
related  ordwary  least  squares  problem  defined  by  equations  (4A8)  through  (4.B5}. 
Then  the  residuals  c  —  Gy^*^  ond  S  —  Ax^*^  are  equal 

Proof.  By  Lemmas  4.2  and  4.3,  we  know  that  y^*J  =  +  Nx^^K  This  means 

that  c  -  Gy^*l  =5  (c  -  Gy,)  -  GiVxt*^.  But  GN  as  A  and  c  -  Gy,  as  5  by  (4.18), 
so  the  result  follows.  □ 

Theorem  4.2  Let  y^®^,  «  Cj  —  Aay^®\  and  r|®^  sa  ct  —  Gty^®l  be  initial  it¬ 

erates  for  BNP  applied  to  problem  LSE.  Let  the  same  quantities  with  an  ar¬ 
bitrary  scree  os  initial  iterates  for  S*AOR.  Let  y^^p  and  yj^p  represent 
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the  kth  solution  iteratts  generated  by  the  two  algorithms  respectively,  and  define 
^]slrp  ~  c  —  Gyj^lfp  and  r%n  =  0—  Gy%p  to  be  the  associated  residuals.  Then 
the  iterates  satisfy  the  inequality 

m  exact  arithmetic,  regardless  of  the  AOR  iteration  parameters. 

Proof.  Apply  BNP  and  3-AOR  to  the  related  ordinary  least  squares  problem 
defined  by  equations  (4.18)  through  (4.25).  By  Lemma  4.4  we  have 

r%*i‘  =  i-Ax!Sk>. 

The  r^ult  then  follows  from  Theorem  4.1.  □ 

While  the  theorem  applies  to  calculations  performed  in  exact  arithmetic,  it 
suggests  that  BNP  should  outperform  3*A0R  on  problem  LS£.  Our  numerical 
experiments  (see  chapter  6)  suggest  that  tins  is  in  fact  the  cose. 

Finally,  we  obtain  as  a  corollary  the  fact  that  algorithm  BNP  is  superior  to 
optimal  3^yclic  SOR.  Since  we  know  that  optimal  2>cyclic  SOR  is  asymptot¬ 
ically  (aster  than  3-cydic  SOR  for  problem  LS£,  this  merely  formalises  what 
one  would  expect. 

Corollary  4.1  Lei  «  ca  —  ond  =  ci  —  Gjy*®*  be  initial 

iterates  for  Bl^P  applied  to  problem  LSE.  Let  the  same  quantiiies  with  an  ar6i- 
Iroiy  serve  as  initifd  iterates  for  S^clic  SOR.  Let  y^Ap  y^R  ^P^ant 
the  idh  solution  iterates  generated  by  the  two  algorithms  respectively,  and  define 
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^BNP  =  c  ”  ^Valrp  ~  associated  residuals.  Then 

the  iterates  satisfy  the  inequality 


in  exact  arithmetic,  r^ardless  of  the  value  of  to. 

Proof.  The  result  in  theorem  4.2  hold  regardless  of  the  AOR  parameters.  In 
particular,  it  holds  for  '(he  special  case  to  ^  0,  which  is  3>cyclic  SOR.  □ 

4*3  BNP  and  Preconditioned  Weighting 

It  is  at  least  mildly  surprising  that  algorithm  BNP  is  also  closely  related  to  the 
preconditiooed  weighting  method  dr^oibed  in  §3.3;  we  detail  the  connection  in 
this  section.  The  analysis  holds  when  the  preconditioner  b  formed  by  inter' 
lacing,  but  the  permutation  matrices  used  to  define  the  iat^lacing  unnecessarily 
complicate  the  notation.  F<»  that  reason,  we  proceed  using  the  augmentation 
tedmique  based  on  the  trapezoidal  nratrix  Bt  (§3>l). 

r  rE  1 

Following  the  notation  in  §3.3,  we  begh^  with  the  matrix  »  q  and 
the  vector  5  s  ,  We  then  ^ply  the  conjugate  gradient  algorithm  to  the 

»  4 

factored  linear  system 

(/  -h  C^Otv  «  (4.27) 

tE 

where  Wi  s  «  and  iVj  »  <?3  are  the  upper  and  lower  blocks  of  VV'  tespec' 
»  *  J 

tivdy,  C  b  the  matrix  and  the  vector  w  »  Wiy  b  the  new  uidmown. 

Recall  from  §3.3  th^  Wi  b  noQ-.9ingular. 

Now  let  E  be  the  trapezmdal  matrix  £|  a  [  Et  ]  defined  in  equa¬ 
tion  (3.6).  If  we  form  IVt  and  IV3  using  the  an^entation  technique  described 
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in  §3.1,  we  obtain 


VVi  =  [’’^‘  r£;«l 
W2  s  I  Gil  On  I , 


where  G  ia  g^ven  by 


P  _  Oil  G32 
Uss  n  ’ 

Uii 

In  this  case,  the  righthand  side  becomes 

W,-’‘K^S=f’’‘l+C’'c 


(4.28) 

(4.29) 


(4.30) 


(4.31) 


We  can  now  explicitly  calctilate  the-  inverse  of  Wi  : 


(4.32) 


This  in  turn  allows  ns  to  rewrite  ihe  pn»:onditioned  system  (4.27)  ais 


r-^Y^v  (/+y^y)j[ 


(4.33) 


where 


»  * 

U>  la  “ 

j 

V  = 


t64- 

cx  -f  y^ca 


(4.34) 

(4.35) 

(4.36) 


and  y  is  as  in  equaUon  (3.^).  In  the  limit  as  r  >-•  co,  the  coefficient  matrix  is 


(4.37) 


(z  +  y^y)  • 


Loosely  speaking,  therefore,  one  would  expect  the  conjugate  gradient  method 
with  the  «>effideat  matrix  given  in  (4.33)  to  perform  as  if  the  matrix  was  ’^ai* 
most^  block  diagonal  The  algorithm  should  produce  in  a  very  small  number 
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of  iterations  (the  associate -’i  diagonal  block  is  nearly  the  identity  matrix),  and 
it  should  act  like  algorithm  BNP  on  the  lower  block  of  the  system.  In  a  sense, 
then,  BNP  may  be  viewed  as  the  limiting  case  of  the  preconditioned  weighting 
method.  This  is  true  despite  the  fact  that  PWgt  solves  an  nxn  system  rather 
than  a  reduced  order  problem.  In  the  limit  as  t  -+  oo,  the  mi  leading  unknowns 
in  the  PWgt  system  are  detemhaed  by  what  amounts  to  a  trivial  linear  system, 
and  the  method  acts  as  if  it  were  an  order-reducing  algorithm  applied  to  the 
lower  (n-mi)  unknowns. 

When  formalizing  the  coimection  between  the  two  schemes,  however,  there  is 
one  subtle  point  to  note.  It  is  tempting  to  start  with  the  block  system  in  (4.33), 
delete  terms  involving  negative  powers  of  r,  and  expect  to  recover  algorithm 
BNP.  It  would  quickly  become  clear  that  this  does  not  work. 

The  problem  is  this:  while  the  lower  left  block  of  the  coefficient  matrix  is 
0(t“'),  the  vector  wz  is  0(r).  Thus  the  product  of  these  two  components  is 
0(1)  and  cannot  be  ignored.  To  see  why  this  is  so,  recall  that  w  —  W\y  by 
definition,  which  means 


Thus 

wz  -  rEty  =  t6,  (4.39) 

and  there  is  no  need  to  compute  wz  at  all. 

But  there  is  also  a  natural  interpretation  of  the  lower  block  wr.  From  (4.38) 
as  well  as  the  definition  of  ri  given  in  equations  (2.6)  and  (3.4),  we  have 

tOR  =  G\y  ss  Cl  -  ri.  (4.40) 

Now,  armed  with  a  clearer  picture  of  what  the  blocks  in  w  represent,  we  can 
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use  (4.39)  and  (4.40)  to  produce  algorithm  BNP  from  the  weighting  method. 

Begin  with  the  second  block  equation  in  (4.33): 

-Y^GitEl^WL  -h  (/  +  Y^Y)wr  =  Cl  +  (4.41) 

T 

Now  substitute  (4.39)  and  (4.40)  into  this  equation,  and  rearrange  to  obtain 

(/  +  Y^Y)ri  =  Y^iYci  +  G21E1H  -  cj).  (4.42) 

Note  that  the  substitutions  have  eliminated  t  entirely;  in  fact,  minor  simplifi¬ 
cation  of  the  righthand  side  produces  the  BNP  system  (3.21). 

Of  course,  the  BNP  unknown  and  righthand  side  are  different  than  those 
for  the  preconditioned  weighting  method,  so  we  can  expect  some  variation  in 
relative  psrfonnance  from  one  problem  to  another.  Moreover,  each  method  has 
its  own  advantages  and  disadvantages.  BNP,  for  example,  is  a  “parameter-free” 
algorithm,  while  PWgt  requires  the  user  to  specify  r:  if  r  is  too  small,  the 
solution  to  the  weighted  problem  will  be  a  poor  approximation  to  the  solution 
of  problem  LSE;  if  r  is  too  large,  then  underflow,  overflow,  or  roundoff  errors 
will  degrade  accuracy  or  prevent  normal  termination.  A  single  PWgt  itera¬ 
tion,  on  the  other  hand,  is  generally  slightly  faster  than  a  corresponding  BNP 
iteration:  BNP  requires  gather/scatter  vector  operations  (multiplications  with 
permutatioM  of  K  and  K"^)  that  are  not  needed  in  the  weighting  algorithm.  We 
demonstrate  in  chapter  6  that  the  algorithms  do  indeed  perform  comparably, 
but  that  neither  is  clearly  superior  to  the  other. 
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5.  Implicit  Nullspace  Methods 


Algorithm  BNP  as  originally  proposed  has  at  least  two  limitations  we  wish  to 
overcome.  The  first  concerns  the  conditions  under  which  we  can  successfully 
construct  a  suitable  preconditioning  matrix  Ai.  Theoretically,  assumptions  HI 

r  1 

and  H2  in  §2.1  are  sufficient:  if  E  has  full  row  rank,  and  q  have  full  column 

f  -E  1  • 

rank,  then  one  can  find  rows  of  G  to  construct  Gi  so  that  Ai  =  ^  is 

L  . 

non-singular.  In  practice,  though,  it  may  be  quite  difficult  to  determine  the 
correct  rows  of  G  to  use  [5].  Moreover,  even  if  we  can  identify  suitable  rows  of 
G,  the  resulting  Ai  may  not  be  easily  invertible. 

As  long  as  G  itself  has  full  column  rank  and  relatively  simple  structure,  we 
can  successfully  produce  an  upper  triangular  Ai  by  interlacing  (§3.1).  Suppose, 
though,  that  G  lacks  full  column  rank.  Then,  given  a  specified  column  of  G, 
there  may  be  no  row  of  G  with  leading  non-zero  in  that  column,  even  after 
orthogonal  reduction.  If  we  need  such  a  row  to  augment  the  stairstep  matrix 
E„  we  will  not  be  able  to  produce  an  upper  triangular  Ai  by  interlacing. 

Even  if  G  has  full  column  rank,  we  may  encounter  difficulties  forming  Ai 
by  interlacing.  Suppose  it  is  impractical  to  reduce  G  to  upper  triangular  form 
(this  is  true,  for  example,  in  the  Stokes  problem  described  in  §2.3).  In  this  case, 
interlacing  can  only  succeed  if  the  unreduced  (or  partially  reduced)  matrix  G 
has  rows  with  leading  non-zeros  in  ail  columns  in  which  Et  does  not  have  leading 
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non-zeros.  Fortunately,  G  as  given  in  §2.3  has  rows  with  leading  non-zeros  in 
every  possible  column,  but  this  is  merely  a  consequence  of  the  simple  geometry 
we  chose  for  the  domain. 

The  second  difficulty  concerns  problems  expressed  in  the  saddle  point  form 
given  in  (2.9);  again,  the  Stokes  model  is  a  good  example.  Here  the  vector  c, 
which  is  needed  to  form  the  righthand  side  of  the  BNP  system  (3.21),  is  not 
generally  available;  instead,  the  vector  s  =  —  is  specified.  Depending  on  the 
structure  of  G  (if  G  is  even  known),  it  may  or  may  not  be  practical  to  determine 
c  explicitly. 

One  way  around  this  obstacle  is  to  attempt  to  rewrite  the  righthand  side 
of  the  BNP  system  in  terms  of  $  rather  than  c.  But  this  is  not  possible  for 
the  problem  as  formulated  in  (3.21),  because  the  righthand  side  lacks  a  critical 
term  involving  ci  (the  term  is  buried  in  the  unknown  ri).  Another  approach,  of 
course,  is  to  use  a  different  algorithm.  The  preconditioned  weighting  algorithm 
is  one  possibility,  and  a  little  algebra  does  allow  us  to  rewrite  the  righthand  side 
of  the  PWgt  system  (3.47)  in  terms  of  a  rather  than  c: 


(5.1) 


But  even  this  doesn’t  overcome  the  difficulty  completely.  We  still  need  to  con¬ 
struct  as  defined  in  (3.43),  and  this  requires  having  G  in  a  form  suitable 
for  use  in  the  interlacing  scheme.  We  would  like  to  produce  an  order-reducing 
conjugate  gradient  algorithm  which  can  solve  saddle  point  problems  in  which 
neither  c  nor  G  is  readily  available. 

In  this  chapter  we  extend  algorithm  BNP  to  a  class  of  methods  capable  of 
dealing  with  each  of  these  difficulties.  Our  approach  is  based  on  the  classiccd 
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nullspace  method,  which  makes  use  of  a  matrix  N  (the  basis  matrix)  whose 
columns  form  a  basis  for  the  nullspace  of  the  equilibrium  matrix.  The  technique 
used  in  §4.3  to  relate  problem  LSE  to  an  unconstrained  problem  is  a  special  case 
of  the  nullspace  method. 

In  §5.1  we  describe  the  nullspace  method,  and  show  that  algorithm  BNP  may 
be  viewed  as  a  variation  of  this  method  for  a  certain  distinguished  choice  of  the 
basis  matrix.  In  BNP,  however,  the  basis  matrix  is  used  but  never  explicitly 
formed;  thus  algorithm  BNP  becomes  an  example  of  a  class  of  methods  we  call 
implicit  nullspace  methods.  We  describe  in  §5.2  a  technique  for  producing 
implicit  nullspace  methods  for  other  choices  of  the  basis  matrix,  and  outline 
a  general  algorithm  suitable  for  implementation.  Then,  in  §5.3,  we  propose 
specific  examples  of  implicit  nullspace  methods,  emphasizing  ways  to  overcome 
the  difficulties  described  above.  In  the  next  chapter,  we  will  report  on  experi> 
ments  with  each  of  these  methods,  using  test  problems  based  on  both  the  static 
analysis  of  engineering  structures  and  Stokes  flow. 

5.1  BNP  as  an  Implicit  Nullspace  Method 

The  original  derivation  of  algorithm  BNP,  reported  in  [4]  and  detailed  in  §3.1, 
mvolves  applying  block  elimination  to  the  modified  Kuhn-Tucker  equations.  In 
this  section,  we  derive  the  method  in  a  new  way,  establishing  a  connection 
between  algorithm  BNP  and  the  classical  nullspace  method  (see,  for  example, 
[7],  [31]).  This  nullspace  characterization  of  BNP,  first  reported  in  James  and 
Plemmons  [20],  leads  to  the  extension  we  describe  in  §5.2. 

We  begin  with  a  description  of  the  nullspace  method  itself.  Suppose  we  are 
given  a  convenient  particular  solution  yj,  to  the  constraint  Ey  ss  6,  and  a 
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matrix  N  whose  columns  form  a  basis  for  the  nuUspace  of  E  (for  convenience, 
we  will  call  N  a  basis  matrix).  Then  any  vector  satisfying  the  constraint  can 
be  written  as  y  =  y,  +  JVx  for  some  choice  of  x,  and  minimizing  [[Gy  —  c\\i 
subject  to  the  constraint  amounts  to  minimizing  |I(?(yp  +  Nx)  —  c||2  over  all 
possible  X.  The  latter  minimization  is  an  unconstrained  problem  of  order  n— mi. 
Forming  the  associated  normal  equations  confirms  that  the  required  x  solves  the 
symmetric  poiitive  definite  system 


N^Cf^GNx  =  N^(^{c  -  Gyp). 


(5.2) 


Now  let  E  be  the  upper  trapezoidal  matrix  Ft  ~  Er  given  in  (3.6). 
Note  that 

r  1 

(5.3) 


EZ^b 

0 


is  a  particular  solution  of  the  constraint  Ety  s  5,  and  the  columns  of 


-Ez'Ek  ] 
I 


(5.4) 


form  a  basis  for  the  nuUspace  of  Et.  Now  write  the  nullspace  normal  equations 
(5.2)  with  these  choices  of  the  basis  matrix  and  particular  solution: 


NJCFGNix  =  NfG^{c  -  Gyp). 


(5.5) 


Note  that  we  can  apply  the  conjugate  gradient  algorithm  to  these  normal 
equations  without  forming  iV)  explicitly;  an  arbitrary  matrix-vector  product 
mvolving  Ni  requires  only  a  triangular  solve  with  Ec,  and  a  matrix-vector 
product  with  Er.  Thus,  this  approach  to  solving  problem  LSE  is  the  simplest 
possible  example  of  what  we  caU  an  implicit  nullspace  method,  or  E^M:  we 
solve  the  nullspace  normal  equations  using  a  distinguished  choice  of  the  basis 
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matrix  N,  but  do  not  actual  form  this  matrix.  In  the  next  section,  we  develop 
a  more  general  version  of  algorithm  INM. 

Now  precondition  in  the  standard  way  with  Gu,  where  Gu  is  the  lower 
righthand  comer  of  the  upper  triangular  matrix  G  as  defined  in  (3.7): 


G:^Nf(fGNjG:[iw  «  Gi^Nj(^{c-Gy^),  where  w  =  Gux.  (5.6) 


Notice  that  this  preconditioned  set  of  normal  equations  is  it;3elf  an  example  of 
an  implicit  nuUspace  method:  since  iV/  is  a  basis  matrix,  so  is  NjB  for  any 
non-singular  B.  Thus,  the  linear  system  (5.6)  reflects  N  —  NiG^j  as  the  choice 
of  basis  matrix.  In  fact,  we  can  say  more  about  this  system.  The  formidable 
looking  coefficient  matrix  simplifies  nicely,  and  it  will  lead  directly  to  the  BNP 
system. 

First  recall  the  technique  used  in  §3.1  for  augmenting  the  trapezoidal  matrix 
Et.  The  matrix  Ai  is  defined  to  be 


At 


El  Er 

Gia 


(5.7) 


The  inverse  of  this  version  of  A\  is  given  by 

EZ^  -EZ^ErG^^] 


which  in  turn  tells  us  that 


(5.8) 


Thus  Y  =s  A^AZ^E  can  be  written 


—EZ^Er 

I 


Gzi  =  NiGZl 


(5.9) 


Y^GiNiGZ^, 


(5.10) 
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Now  return  to  the  normal  equations  given  in  (5.6).  Prom  equation  (4.16)  we 
know  that  G\Ai^K  =  /.  This.,  combined  with  (5.10),  gives  us 


GNiG:^  ^ 


Gi 


I 

Y 


(5.11) 


Therefore,  the  coefficient  matrix  in  (5.6)  is  J  +  which  is  precisely  the 
coefficient  matrix  in  BNP. 

While  the  BNP  system  (3.21)  and  the  preconditioned  nuUspace  system  (5.6) 
involve  different  unknowns  and  righthand  sides,  we  can  interpret  the  unknown 
to  in  a  way  which  completes  the  connection  between  the  two  methods.  Partition 


VL 

VR 


(5.12) 


compatibly  with  the  rows  of  Ni,  and  use  the  fact  that  y  =s  yp  +  jV/s  to  find  that 
X  s  yR.  Thus,  the  deiining  equation  to  s  G^x  can  be  written 


to  ss  Gityn  =  G\y.  (5.13) 

But  rx  =  Cl  -  G\y  by  the  definition  of  rj.  Thus  we  obtain  a  relationship  between 
the  BNP  unknown  ri  and  the  unknown  to  in  the  nullspace  normal  equations: 


to  =3  Ci  -  ri.  (5.14) 

Use  this  fact  to  rewrite  (5.6)  as 

(/  Y^Y){c,  -  n)  =  G^^NjG-ic  -  Gyp).  (5.15) 

Finally,  move  ci  to  the  right-hand  side  and  simplify:  the  result  is  the  BNP  sys¬ 
tem  as  given  in  (3.21).  Thus  algorithm  BNP  is  essentially  an  implicit  nullspace 
method,  with  N  w  iV/GJa  as  the  (unformed)  basis  matrix.  While  we  have  shown 
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this  only  for  the  simple  augmentation  of  the  trapezoidal  matrix  Et,  the  conclu¬ 
sion  also  holds  when  using  interlacing  to  augment  the  stairstep  matrix  E,. 

The  only  significant  distinction  between  the  BNP  system  (3.21)  and  the 
system  in  (5.15)  is  the  fact  that  the  two  systems  involve  difierent  unknowns 
(and,  of  course,  difierent  righthand  sides).  In  the  next  section,  we  will  see 
that  this  change  in  unknown  is  exactly  what  we  need  to  produce  an  algorithm 
suitable  for  saddle  point  problems. 

5.2  Algorithm  INM:  General  Case 

_  p-i  p 

In  the  last  section,  we  began  with  a  certain  natural  basis  N/  =  /  ^ 

the  nullspace  of  the  trapezoidal  equilibrimn  matrix  Et,  and  demonstrated  that 
algorithm  BNP  can  be  viewed  as  an  impUcit  ntiUspace  method  with  N  s  NiGi^ 
as  the  basis  matrix.  We  also  observed  that  N  a  NiB  is  a  basis  matrix  for 
any  non-singular  matrix  B.  The  ba«c  idea  behind  the  extension  of  BNP 
should  now  be  clear:  instead  of  using  Gf}  as  a  preconditioner  for  the  nullspace 
normal  equations  given  in  (5.5))  precondition  (5.5)  with  any  convenient  non¬ 
singular  (n-mi)x(n-mi)  matrix  B,  producing  an  implicit  nullspace  method 
with  N  s  N[B  as  the  basis  matrix.  But  so  far  we  have  only  considered  the 
trapezoidal  matrix  Et  obtained  by  column  pivoting.  To  make  this  approach 
practical,  we  would  like  to  use  interlacing  to  construct  a  basis  matrix  and  par¬ 
ticular  solution  for  the  stairstep  matrix  E,. 

Recalling  equation  (3.11),  let  P  =s  |  Pr  |  be  a  permutation  matrix 
relating  the  tr:^>ezoidal  matrix  Et  and  the  stairstep  matrix 

E,P  ==  [  E,Pl  E,Pr  ]  =  [  Br  ]  =  (5.16) 


78 


Now  let  JWi  be  a  matrix  of  size  (n—mi)xn  such  that  the  matrix  Bi  defined  by 

>  * 

is  non-singular;  we  will  call  such  an  Mi  an  augmentation  matrix.  It  is  qmte 
easy  to  construct  an  M\  with  this  property:  use  rows  with  leading  non-zeros 
in  the  (n— mi)  columns  in  which  the  stairstep  matrix  E,  does  not  have  leading 
non-zeros.  Think  of  Mi  as  generalizing  the  role  that  G\  plays  in  BNP.  In  the 
special  case  E,  =  Et,  we  would  have  Mi  =  [  0  Mia  1 1  where  Mia  plays  the 
role  of  Gia. 

Finally,  by  analogy  with  Ai  in  equation  (3.10),  apply  interlacing  to  produce 
an  upper  triangular  nutrix  defined  by 

B,  =  PBi~P  .  (5.18) 

We  are  now  in  a  position  to  define  the  basis  matrix  N  and  the  particular 
solution  y^. 

Theorem  5.1  The  vector  y,  =  ^  satisfies  the  constraint  E,y  —  6  for 

»  , 

ony  choice  of  v. 

Proof  Note  that  where  Bi  is  defined  in  equation  (5.17).  Also, 

by  an  argument  similar  to  equation  (4.16),  we  find  that  E,B{^  s  I  /  0  I .  Use 
these  two  facts  to  establish  the  result: 

E.y,  =  E.B{'P^P  ‘  =E,B{'  ‘  =  [  ^  0  ]  [  J  ■ 

which  is  6  as  required.  □ 

The  choices  v  =  0  and  v  a  ci  are  particularly  convenient  when  using  the 
results  of  Theorem  5.1  to  define  y^;  both  are  present  naturally  when  initializing 
quantities  in  advance  of  forming  Bi. 


Theorem  5.2  The  columns  of  N  =  form  a  basis  for  the  nulbpace  of 

the  stairstep  matrix  E,. 


Proof.  N  is  the  proper  size  aud  has  full  column  rank,  so  we  need  only  establish 

that  EN  =  0.  Recall  from  the  proof  of  Theorem  5.1  that  =  Bx^P^  and 

0 


E,^^  “  [  ^  ®  = 


by  orthogonality.  So 


EN  =  E,B:^Pr  »  E,^^P^Pr  =  [  /  0  ] 


0 

I  '  ’ 


which  is  zero  as  required.  □ 

These  two  theorems  give  us  what  we  need  to  describe  the  'mplicit  nulbpace 
method  in  a  general  setting:  we  implement  the  algorithm  by  applying  the  con¬ 
jugate  gradient  method  to  the  factored  nulbpace  normal  equations  (5.2),  with 
N  and  y,  as  above.  Table  5.1  summarizes  the  algorithm.  In  the  table,  Sk  is  the 
conjugate  gradient  direction  vector,  and  <4  »  N^(j^{c  -  Gyp)  -  N^CP'GNx^^^^ 
b  the  residual  associated  with  the  normal  equations.  The  vector  qu  stores  the 
product  of  the  coefficient  matrix  with  the  direction  vector.  We  use  a  starting 
vector  of  a  0  (which  gives  us  yW  a  y^),  but  an  arbitrary  starting  vector 
presents  no  difficulties. 

Unlike  BMP,  algorithm  INM  b  simple  to  modify  for  problems  in  saddle  point 
form  (2.9):  when  c  and  G  are  not  available,  simply  substitute  s  =  —G^c  in  the 
nulbpace  normal  equations  (5.2)  to  obtain  the  new  righthand  side 


(5.19) 


Abo  note  that  it  b  easy  to  preserve  the  opportunities  for  block-based  parallelism 
discussed  in  §3.1:  gven  a  substructured  problem,  we  need  only  construct  A/i 
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Table  5.1:  Implicit  NuUspace  Method  (General  Case) 


1.  Use  Gauss  Elimination  or  orthogonal  reduction  on  E  and  b 
to  replace  E  with  its  stairstep  form  E,. 

2.  Choose  a  convenient  augmentation  matrix  M^.  The  interlacing 
information  (represented  by  the  permutation  matrix  P  below) 
can  be  stored  in  a  pointer  vector. 


3.  Form  B\^P 


4.  Initialize: 


andSo^P 


(a)  as  0 

(b)  do  «  P^Bfa^ic  -  GB{%) 

(c)  So  as  dj 

S.  For  lb  S3  0, 1,. . . ,  until  d|'(4  <  toleraace: 

(a)  qt  as  P^BfFBT^PH»k 

(b)  Ok^^dk/alqk 


(d)  0k+i  as  4'+,4+l/d|‘dk 

(e)  sui  =  dk+i  4*  0k*iak 

6.  Recover  y  as  Bx'{Prx  4*  ib)  and  exit. 
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so  that  it  conforms  to  the  substructuring  to  ensure  that  interlacing  produces  an 
upper  triangular  matrix  with  the  proper  structure. 

It  remains  to  show  that  we  can  deal  with  problems  in  which  G  lacks  full 
column  rank.  We  will  deal  with  tins  topic  in  the  next  section,  when  we  look  at 
specific  choices  of  the  augmentation  matrix  Mi. 

5.3  Examples  of  Implicit  Nullspace  Methods 


In  this  section,  we  propose  some  sp^afic  examples  of  implicit  nullspace  methods, 
and  discuss  circumstances  under  which  they  might  be  appropriate.  Each  is  based 
on  a  natural  choice  of  the  augmentation  matrix  Mi. 


Algonthm  INMI 

The  simplest  possible  implicit  nullspace  method,  which  we  call  algorithm 
INMI)  involves  augmenting  the  stairstep  matrix  with  rows  of  the  (scaled  or 
unsealed)  identity  matrix  I.  If,  for  example,  the  matrix  £«  has  the  patierc 


(5.20) 


then  define 


After  interlacmg,  we  have 


(5.21) 


(5.22) 
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If  E,  has  substnictured  form,  then  so  does  Bi,  and  triangular  solves  involving 
Bi  and  Bf  may  be  done  in  parallel  as  described  in  §3.1.  Moreover,  these  solves 
can  be  coded  to  exploit  the  presence  of  rows  of  the  identity. 

A  formal  description  of  the  INMI  interlacing  begins  with  a  permutation 
matrix  P  relating  the  stairstep  form  E,  to  the  trapezoidal  form  Et,  Given 

E,Pr]=^[El  Er]=^E,  (5.23) 

as  in  (3.11),  the  matrix  Mi  =  Pr  is  precisely  the  augmentation  matrix  we  seek. 
With  this  choice  of  M\,  it  is  easy  to  show  that  the  basis  matrix  N  —  B^^Pr 
reduces  to  ss  PiV/,  where  Nt  is  the  fundamental  basis  matrix  defined  in 
equation  (5.4).  Similarly,  the  particular  solution  =  Bf  q  is  actu 
S'*' 5 

ally  y>  »  P  q  ,  which  is  a  row  permutation  of  the  particular  solution 

»  A 

given  in  (5.3).  Thus  INMI  is  nothing  but  the  implicit  nulispace  method  in  (5.5), 
corrected  for  ioterladng. 

Algorithm  INMI  is  one  way  to  solve  problems  in  which  G  is  unavailable, 
prohibitively  dense,  lacks  full  column  rank,  or  is  otherwise  unsuitable  (oi  irtet' 
lacing;  in  particular,  INMI  is  capable  of  handling  saddle  point  problems.  But, 
since  we  ignore  all  information  in  the  matrix  G  when  constructing  the  precon¬ 
ditioner,  there  is  good  reason  to  expect  little  preconditioning  effect  beyond  the 
order  eduction  itself.  This  is  in  fact  what  we  observed  in  our  experiments  (see 
chapter  6). 

Algorithm  INMG 

If  (7  has  full  column  rank  and  convenient  structure,  we  can  augment  B,  by 
interlacing  rows  of  G,  exactly  as  we  do  iu  algimthm  BNP.  In  fact,  as  shown 
in  the  previous  so:tion,  the  resulting  implicit  nuUspace  method,  which  we  call 
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algorithm  INMG,  has  the  same  coefficient  matrix  as  the  BNP  system.  Of  course, 
as  with  BNP,  algorithm  INMG  may  fail  if  G  lacks  full  column  ramk,  since  the 
rows  of  G  needed  to  augment  E,  may  not  be  available. 

The  linear  systems  in  BNP  and  INMG  differ  only  in  their  unknowns  and 
righthand  sides.  Let  a  be  the  unknown  coordinate  vector  in  INMG  (in  §5.2  we 
used  w  for  other  reasons).  Then,  by  the  analysis  in  the  previous  section,  x  is 
related  to  the  unknown  n  in  BNP  by 

(5.24) 

Recall  from  §5.2  that  the  translated  unknown  in  INMG  gives  us  an  important 
advantage  over  algorithm  BNP;  unlike  BNP,  algorithm  INMG  can  be  used  to 
solve  saddle  point  problems.  We  demonstrate  this  on  a  Stok«:  problem  in  chap¬ 
ter  6. 

In  the  special  case  c  »  0  (e.g.  the  structures  application),  the  two  unknowns 
satisfy  X  •»  —ri,  so  the  BNP  and  INMG  systems  differ  only  by  a  negative  sign. 
The  stop  criteria  in  the  algorithms  also  have  the  same  interpretation,  so  we 
should  expect  the  two  algorithms  to  perform  identically.  In  fact,  INMG  out¬ 
performed  BNP  in  all  our  experiments  (see  chapter  6);  this  is  bow  we  identihed 
the  instability  in  the  BNP  recursion  (3.34)  deiiuiag  the  defect  vk- 

As  with  each  of  the  methods  we  describe,  there  are  opportunities  to  code 
solves  and  nu^rix-vector  products  in  INMG  to  exploit  the  special  properties  of 
the  preconditioner.  In  particular,  depending  on  the  structure  of  F  and  G,  one 
may  want  to  use  the  &ct  that  »  /  to  compute  the  conjugate  gradient 

direction  vector 
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Figure  5.l!  Interlacing  in  Algorithm  INMGI 


Algorithm  INMGI 

Wo  have  already  observed  several  times  that  ii  G  lacks  full  column  rank,  we 
may  be  unable  to  interlare  rows  of  £»  and  C  to  produce  an  upper  triangular 
matrix  J?t.  In  particular,  if  there  Is  no  row  of  E,  with  leading  non-zero  in 
column  j,  then  interlacing  requires  the  exirtence  of  a  row  of  G  with  leading 
non 'Zero  in  that  colunm.  When  G  lacks  full  column  rank,  such  a  row  of  G  may 
or  may  not  exist. 

There  is,  however,  an  obviom>  way  to  combine  the  techniques  in  algorithms 
IN  MI  uid  IN  MG  to  overcome  tins  difBculty.  One  can  use  rows  of  G  to  augment 
Eg  when  such  rows  are  available,  and  use  rows  of  the  identity  matrix  I  when 
there  is  no  appropriate  row  in  G.  We  call  this  ^proach  algorithm  INMGI. 

Consider,  for  example,  the  matrices  in  figure  5.1.  To  produce  Bi,  the  aug- 
ment^ion  matrix  Mi  requires  rows  with  leading  non-zero.s  in  each  of  columns 
I,  3,  and  6.  There  are  rows  of  G  with  leading  non-zeros  in  columns  1  and  6,  so 
we  include  these  rows  in  Mi.  The  remaining  row  of  Mi  comes  from  the  identity 


matrix;  after  mterladng,  Bi  has  the  depicted  form. 

While  this  approach  is  successful  in  solving  problems  involving  rank  deficient 
G,  there  is  a  potential  problem:  unless  the  non-zeros  in  all  rows  of  Mi  are  of 
roughly  the  same  magnitude,  the  resulting  basis  matrix  may  have  columns  of 
widely  varying  size,  resulting  in  a  badly  conditioned  coefficient  matrix.  Consider 
a  very  simple  example.  Let  E  be  of  the  form 

£=  [El  Ec  (5.25) 

where  Et,  is  upper  triangular  and  non-singular.  Suppose  there  are  no  rows  of  G 
with  leading  non-zeros  in  the  columns  associated  with  Ec^  Then  form 

Et,  Ec  Er 

D  ,  (5.26) 

where  D  is  a  diagonal  matrix  (a  scaled  version  of  the  portion  of  I  used  in 
the  augmentation),  and  G\t  is  again  the  lower  righthand  corner  of  G.  We  can 
explicitly  calculate  the  inverse  firom  that,  we  obtain  the  basis  matrix: 

Ar=[c,i5-'  Cjffr,'].  (5.27) 

where  Ci  and  Cs  are  given  by 

’  ^El^Ec  ■ 

/  ,  Ca=  0  .  (5.28) 

0  / 

»  «  W  a 

Assuming  the  non-zero  elements  of  B  are  of  roughly  the  same  magnitude, 
the  non-zeros  of  C\  and  C)  axe  likely  to  be  of  order  one.  Thus,  the  conditioning 
of  the  basis  matrix  depends  on  Bf*  and  Gi^.  If  the  non-zeros  in  one  of  these 
matrices  are  of  different  magnitude  than  those  in  the  other,  the  basis  matrix 
(and  hence  the  nullspace  normal  equations)  will  probably  be  poorly  conditioned. 
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We  observed  this  phenonenon  in  our  experiments:  without  scaling,  conver¬ 
gence  of  INMGI  actually  required  more  iterations  than  the  size  of  the  problem. 
In  our  engineering  test  problems,  however,  the  blocks  of  G  were  all  of  compa¬ 
rable  magnitude.  Thus,  we  simply  scaled  G  and  c  by  a  constant  multiple  of 
the  identity  so  that  the  non-zeros  in  G  were  0(1).  The  restilts,  reported  in 
chapter  6,  were  much  more  encouraging. 

Algorithm  INMF 

We  have  already  proposed  one  implicit  nuUspace  method,  algoiithm  INMI, 
capable  of  solving  problems  in  which  the  matrix  G  is  either  unavailable  or 
unsuitable  for  interlacing.  Intuitively,  though,  this  approach  seems  less  than 
promising,  since  we  ignore  all  information  in  G  when  constructing  the  precon¬ 
ditioner.  Experiments  reported  in  chapter  6  confirm  that  the  preconditioner 
produced  by  INMI  is  not  effective  at  all. 

There  is,  however,  another  way  to  approach  the  problem:  we  can  use  infor¬ 
mation  in  F  ss  G^G  to  construct  the  augmentation  matrix.  One  possibility  is 
to  generate  a  Cholesky  factorization  of  a  portion  F  (for  example,  the  block 
diagonal  portion),  then  use  rows  of  this  factorization  to  form  Mi.  Another 
option,  as  yet  untested,  is  to  use  selected  rows  of  an  incomplete  Cholesky 
factorization  of  F  (see  the  appendix  for  a  brief  description  of  such  a  factor¬ 
ization).  In  both  cases,  the  intent  is  that  the  factorization  produces  a  G  which 
b  at  least  a  rough  approximation  of  the  matrix  G. 

While  intuition  suggests  th^  thb  approach  should  produce  a  better  pre- 
conditioner  than  algorithm  INMI,  there  b  one  potential  problem  with  using 
approximate  factoriz^ions  of  F  to  form  Mi.  It  b  important  to  remember  that 
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we  are  not  using  the  entire  factorization  as  a  preconditioner;  we  are  instead  us¬ 
ing  only  selected  rotas  of  what  is  already  a  crude  approximation  to  G.  This  may 
result  in  an  augmentation  matrix  of  little  or  no  value.  It  is  possible,  perhaps 
likely,  that  the  cost  of  both  producing  the  factorization  and  using  the  resulting 
Bi  within  iterations  may  prove  to  be  wasted  effort.  In  fact,  this  is  precisely 
what  we  observe  in  one  of  the  experiments  described  in  the  next  chapter. 

In  chapter  6,  we  report  on  experiments  with  most  of  the  methods  described 
here.  We  note,  however,  that  there  are  other  possibilities  which  remain  untested. 
One  could,  for  example,  use  an  incomplete  orthogonal  factorization  of  a 
complicated  matrix  G  to  produce  a  simpler  approximation  G  suitable  for  in¬ 
terlacing  (see  appendix).  Another  possibility  is  what  one  might  call  a  partial 
orthogonal  factorization  of  G:  instead  of  reducing  G  completely,  use  orthog¬ 
onal  rotations  on  only  a  selected  portion  of  G  to  produce  a  matrix  d  suitable  for 
interlacing.  It  is  clear  that  the  possibiliti(»  are  endless:  algorithm  INM  offers  a 
great  deal  of  fleadbility  in  constructing  the  preconditioner.  What  we  have  yet 
to  establish  is  whether  any  of  the  proposed  algorithms  efficiently  solve  problem 
LSE  in  its  various  formul^ions. 
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6,  Numerical  Experiments 


la  this  chapter,  we  report  on  the  results  of  numerical  experiments  with  the 
iterative  algorithms  discussed  in  this  thesis,  including  both  the  conjugate  gradi¬ 
ent  methods  (BNP,  Pwgt,  and  various  forms  of  INM)  and  the  linear  stationary 
methods  (2-cyclic  SOR  and  3<A0E).  We  compare  and  contrast  the  behavior  of 
the  algorithms,  and  examine  parallel  performance  on  substructured  problems. 

Section  6.1  is  a  summary  of  the  conditions  under  which  we  conducted  the 
tests.  In  §6.2,  we  consider  three  small  models  of  elastic  structures.  Besides 
providing  us  with  an  opportunity  to  validate  the  codei,  these  problems  offer  the 
best  means  of  studying  the  behavior  of  2-SOR  and  3-AOR.  Then,  in  §6.3,  we 
consider  the  performance  of  the  cxmjugate  gradient  methods  on  elastic  problems 
of  more  realistic  size. 

La  the  last  two  sections,  we  look  carefully  at  the  extensions  to  algorithm  BNP 
by  considering  problems  for  which  BNP  b  not  well  suited.  Section  6.4  descril^ 
models  structures  with  simulated  Va-Ttage,”  producing  problems  in  which 
the  matrix  G  lacks  full  column  tank.  La  §6.5  we  consider  the  markei-and-cell 
saddle  point  formulation  of  the  Stokes  problem. 
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6.1  Overview  of  Experiments 

All  experiments  were  run  on  a  two-processor  AUiant  FX/40.  We  employed  full 
optimization,  including  vectorization,  on  all  test  problems,  but  we  inhibited 
vectorization  within  the  codes  when  it  was  appropriate  to  do  so.  We  also  used 
the  DAS  compiler  option,  allowing  the  compiler  to  assume  that  finite  precision 
arithmetic  is  associative.  This  gives  the  compiler  the  flexibility  to  rearrange 
the  order  of  computations,  enhancing  the  opportunity  for  concurrent  execution. 
Occasionally,  however,  the  DAS  option  causes  the  results  for  an  experiment  run 
on  two  processors  to  differ  slightly  from  those  obtained  on  a  single  processor; 
in  partictilar,  one  sometimes  observes  minor  difforencei  in  the  total  number  of 
iterations  required  to  achieve  a  sdution  of  a  specified  accuracy.  To  simplify  the 
tables,  we  report  iteration  counts  only  for  the  two  processor  case. 

Elxecutioa  times  (in  seconds),  obtained  using  the  AUiant  etiae  intrinsic,  in¬ 
clude  aU  c^>eratioas  ex<»pt  input/output  (for  the  Stokes  problems,  this  includss 
the  cost  of  generating  the  matrices).  In  aU  problems,  however,  only  the  iteration 
times  were  significant:  pre-processing,  including  factoring  B  and  completing  the 
interlacing  step,  typically  required  only  1-2%  of  the  cpu  time. 

For  convenience,  we  list  the  algorithms  tested  in  our  experiments: 

BNP:  order-reducing  conjugate  gradient  method  due  to  Barlow,  Nichob,  and 
PlemmoQs  (§3.2). 

Pwgi:  preconditioned  weighting  method  (§3.3) 

2-SOR:  optimal  2-cycllc  successive  over-relaxation  applied  to  the  modified 
Kuhn-Tucker  equations  (§3.3). 
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3-AOR:  optimal  three-block  accelerated  over-relaxation  applied  to  the  modi¬ 
fied  Kuhn-Tucker  equations  (§3.3). 

INMI:  implicit  nuUspace  method  in  which  the  preconditioner  is  formed  by 
interlacing  rows  of  E  with  rows  of  the  identity  matrix  I  (§5.3). 

INMG:  implicit  nuUspace  method  in  which  the  preconditioner  is  formed  by 
interlacing  rows  of  E  with  rows  of  G  (§5.3);  essentiaUy  equivalent  to  al¬ 
gorithm  BNP. 

INMGL  implicit  nuUspace  method  in  which  the  preconditioner  is  formed  by 
interlacing  rows  of  E  with  rows  of  G  when  the  appropriate  rows  of  G  are 
available.  When  such  rows  are  not  availad>le,  rows  of  the  identity  matrix  I 
are  used  to  complete  the  construction  (§5.3). 

INMF:  impUdt  nuUspace  method  in  which  the  preconditioner  is  iormed  by 
interlacing  rows  of  E  with  rows  constructed  in  some  way  fiom  information 
in  the  matrix  F  (§5.3).  In  our  tests  on  the  Stokes  problem,  we  use  rows 
&om  the  Cholesky  factor  of  the  block  diagonal  portion  of  F. 

AU  programs  were  written  in  FOHTRAN77  using  double  predsion  arithmetic 
and  sparse,  row-oriented  data  structures.  Since  we  attempted  to  solve  three 
distinct  typ^  problems  (elastic  structures,  structures  with  rank  deficient 
blodcs  in  G,  and  Stok»  fiow),  we  coded  as  many  as  three  versions  of  a  given 
algorithm.  For  each  type  of  problem,  we  designed  the  codes  so  that  the  data 
structures,  logic,  and  primary  subroutines  (espedaUy  the  Cactorbation  of  B,  the 
triangular  solv^,  and  mattix- vector  multiplications)  were  as  similar  as  possible 
one  algorithm  to  another.  We  did,'  of  course,  try  to  take  advantage  of 
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special  featiires  of  each  algorithm.  Thtis,  for  example,  the  codes  for  algorithm 
INMI  accomplish  triangular  solves  involving  Bi  by  exploiting  the  fact  that  Bi 
contains  rows  of  the  identity  matrix. 

We  did  not  write  special  codes  for  algorithm  INMG.  Instead,  we  used  the 
INMGI  codes  to  run  algorithm  INMG  on  full  rank  problems  (algorithm  INMGI 
is  equivalent  to  INMG  in  this  case,  since  the  rows  of  G  needed  for  interlacing 
are  always  available).  The  logic  and  data  structures  in  INMGI  axe  somewhat 
more  complex  than  for  the  oth^  algorithms,  since  the  code  must  be  able  to 
identify  and  deal  with  column  rank  deficiencies  in  G.  Thus,  the  times  reported 
for  algorithm  INMG  are  slightly  slower  than  they  would  be  had  we  written  code 
specifically  designed  for  full  rank  problems. 

We  decided  that  the  fairest,  moat  meaningful  way  to  compare  the  algorithms 
was  to  report  the  tima  requited  to  produce  a  result  of  specified  accuracy.  There- 
fore,  for  each  problem  we  constructed  a  Hrue”  solution  (by  means  described 
in  each  section),  and  adjusted  the  stop  tolerance  for  each  algorithm  until  the 
reported  error  (in  the  infinity  norm)  was  as  close  as  possible  to  a  specified  accu¬ 
racy.  We  used  the  standard  st<9  criterion  for  all  conjugate  gradient  methods:  if 
the  algorithm  srfives  the  symmetric  positive  definite  system  terminate 

execution  when  the  quantity  [{/  —  is  less  than  the  specified  tolerance 

(this  quantity  occurs  naturally  within  each  iteration).  For  the  linear  stationary 
methods,  we  terminated  execution  when  the  change  in  successive  iterates  was 
smaller  whan  the  specified  tolerance. 

In  the  pr^Konditioned  weighting  algorithm,  we  experimented  with  the  value 
of  the  weighting  parameter  r  to  obtain  the  best  possible  results.  In  all  cases,  the 
best  choice  of  r  was  between  10’  and  10^,  with  little  difference  in  performance 
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as  r  vavied  over  this  range.  This  is  consistent  with  the  theory  in  Barlow  [2]. 

We  also  report  that  we  have  have  run  the  codes  on  several  other  architec¬ 
tures,  including  the  Cray  Y-MP,  the  AUiant  FX/8,  and  Sun  3/50  workstations. 
We  have  only  anecdotal  results  on  these  machines,  and  did  not  attempt  to  opti¬ 
mize  the  ported  codes.  Therefore,  we  include  here  only  restilts  for  experiments 
on  the  FX/40. 

6.2  Small  Full-Rank  Structures  Problems 

The  three  small  test  problems  described  in  this  section  are  all  models  of  elastic 
structures  (see  §2.2).  The  resulting  omstraiued  minimization  problems  involve 
a  symmetric  positive  definite  matrix  F  which  is  block  diagonal.  Thus,  the 
corresponding  Cholesky  factor  G  is  square,  of  full  rank,  and  block  diagonal 
with  upper  triangular  blocks.  We  used  these  problems  as  a  means  of  validating 
<mr  codes,  but  they  also  gave  us  a  chance  to  compare  the  linear  stationary 
methods  (2-SOR  and  8-AOR)  against  the  conjugate  gradient  algorithms  (BNP, 
INMG,  INMI,  and  Pwgt). 

Problem  WEENCH4  (figure  6.1a)  is  a  substructured  version  of  a  test  prob¬ 
lem  due  to  Lawo  [6].  It  consists  of  48  planar  elements,  and  produces  a  problem 
with  112  omstraints  and  216  tmknowns.  The  diagonal  blocks  in  the  element 
flexibility  matrix  are  5x5  for  the  rectangular  elements,  and  3x3  for  the  tri¬ 
angular  elements.  The  applied  force  is  indicated  by  the  arrows  in  the  figure. 
We  partitioned  the  structure  into  four  substructures  as  shown  in  the  figure;  the 
resulting  transition  zone  has  50  columns. 

Problem  PAM2  (figure  6.1b)  is  a  trapezoidal  re^on  intended  to  be  a  rough 
approximation  of  a  cross-section  a  dam,  modelled  using  square  and  trian^ar 
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(b)  DAM2 


Figure  0.1:  Models  for  Small  FViU'Rank  Problems 

planar  elements  as  d^cribed  in  Prsemieniedd  [32].  like  WIIENCH4,  the  blocks 
in  F  are  5x5  for  the  rectaoguUr  dements,  and  3x3  for  the  triangular  elements. 
The  external  load  umulates  a  body  of  water  against  the  left  vertical  wall  The 
model  produces  a  problem  with  104  constraints  and  244  unknowns.  There  are 
two  substructures  as  shown  in  the  figure;  the  resulting  transition  zone  has  33 
columns. 

Problem  SOLIDl  (figure  6.1c},  also  modelled  using  t«dmiques  in  [32],  ap* 
proximates  a  building  subjected  to  the  force  of  a  steady  wind  approaching  one 
of  its  vertical  edgea  This  version  consists  ol  60  solid  tetrahedral  dements  (five 
tetrahedrons  in  each  small  cube),  and  produces  a  problem  with  81  constraints 
and  360  unknowns.  Each  of  the  60  blocks  in  the  blodc  diagonal  matrix  F  is 
6x6.  Again  there  are  two  substructures.  The  resulting  transition  zone  is  quite 
large,  consisting  of  120  out  of  360  columns. 

In  practice,  these  problems  are  all  too  small  to  justify  substructuriog  tech- 
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niques;  the  transition  zones  are  far  too  large,  and  the  svbstructures  themselves 
are  not  weil  balanced.  We  constructed  substnictured  versions  of  the  small  prob¬ 
lems  primarily  to  validate  the  codes.  Note  that  we  deliberately  chose  partition¬ 
ings  which  produced  at  least  one  unstable  substructure  (see  §2.4)  so  that  there 
would  be  diagonal  blocks  in  E  which  were  deficient  in  row  rank.  In  all  three 
problems,  we  could  have  chosen  natural  partitionings  which  produce  stable,  full 
rank  substructures. 

To  measure  error,  we  obtained  the  “true"  solution  to  each  problem  by  solv¬ 
ing  the  original  Kuhn-Tucker  equations  (2.6)  using  UNPACK  [8].  We  then 
a4justed  the  stop  tolerances  for  each  algorithm  so  that  the  infinity  norm  of  the 
reported  error  was  roughly  10'^.  We  summarize  the  results  of  the  experiments 
in  table  6.1. 

The  results  for  SOR  and  AOR  are  for  the  approximate  optimal  values  of 
the  iteration  parameters  (obtained  experimentally).  We  report  that  AOR  is 
highly  sensitive  to  the  choice  of  the  parameters:  very  small  deviations  from 
the  optimal  values  result  in  either  divergence  or  a  drastic  reduction  b  the  rate 
of  convergence.  In  all  three  test  problems,  the  region  of  convergence  in  the 
plane  l^>pears  to  be  a  tby,  narrow  crescent-shaped  region.  The  choice 
of  parameter  b  2-cyclic  SOR  is  consistent  with  the  theory  (see  (4)  and  [24]): 
convergence  otxurs  for  all  u;  bdow  a  sufficiently  small  critical  value,  with  the 
optimal  chotce  occurring  very  close  to  that  critical  value.  It  is  clear  front  the 
tables  that  these  test  prc^lems  do  not  satisfy  the  additions  in  Papadopoulou, 
Saridakis,  and  Papatheodorou  (28],  since  2-SOR  outperforms  8-AOR  by  a  wide 
mar^n.  Moreover,  br  each  of  these  problems,  all  of  the  conjugate  gradient 
algorithms  proved  vastly  superior  to  the  linear  stationary  methods. 
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Table  6.1:  Numerical  Results:  Small  Full-Rank  Structures  Problems 


WRENCH4 

(216  unknowns,  112  constraints) 


BNP  INMG  FWgt  INMI  2-SOR  3-AOR 


33 

33 

34 

40 

457 

2439 

2  proc  time 

.345 

.352 

.343 

.355 

3.58 

19.0 

1  proc  time 

.535 

.550 

.522 

.618 

5.75 

30.7 

Speedup 

1.55 

1.56 

1.52 

1.74 

1.61 

1.62 

DAM2 

(244  unknowns,  104  constraints) 


BNP  INMG  PWgt  INMI  2-SOR  3-AOR 


1  52 

49 

51 

78 

818 

5991 

2  proc  time 

.609 

.^71 

.791 

60.8 

1  proc  time 

.899 

.911 

.852 

1.30 

12.4 

9i.7 

Speedup 

1.51 

1.50 

1.49 

1.64 

1.50 

1.51 

SOLIDl 

(360  unknowns,  81  constraints) 


BNP  INMG  PWgt  INMI  2-SOR  3-AOR 


41 

41 

42 

88 

208 

609 

EUSSSCiSS 

.735 

.773 

.721 

1.32 

3.32 

9.56 

1  proc  time 

1.08 

1.13 

1.06 

2.06 

4.81 

14.0 

1.47 

1.46 

1.47 

1.56 

1.45 

1.46 

°  Minor  differences  occasionally  occur  when  changing  number 
of  processors.  Statistics  are  for  two-processor  runs. 
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Algorithms  BNP,  INMG,  and  PWgt  all  perfomied  comparably  on  these  prob¬ 
lems.  This  is  as  expected:  BNP  and  INMO  axe  solving  the  same  linear  system 
(§5.3),  and  both  BNP  and  INMG  are  essentially  the  limiting  case  of  PWgt 
(§4.3).  In  the  next  section,  however,  we  will  observe  that  these  algorithms 
behave  differently  on  larger  test  problems. 

Note  that  INMI  performed  measurably  slower  than  the  other  conjugate  gra¬ 
dient  algorithms,  especially  on  the  three-dimensional  problem  SOLIDl.  This 
is  consistent  vith  our  expectations:  while  INMI  is  an  order-reducing  method, 
the  algorithm  ignoies  all  information  in  the  matrix  G  when  constructing  the 
preconditioncr. 

On  ail  problems,  sp<^ups  are  quite  reasonable  given  the  large  transition 
and  imbalances  in  the  substructures.  They  are  also  consistent  across  the 
algorithms;  this  reflects  the  fact  ths^  the  principal  subroutines  are  similar  in 
each  of  the  codes. 

6.3  Larger  Full-Rank  Structures  Problems 

Having  validated  the  codes  on  the  small  test  problems  described  in  the  previous 
section,  we  then  experimented  with  larger  elastic  structures  problems;  we  report 
the  results  of  those  tests  here.  Because  of  the  poor  performance  of  2'SOR  and 
3*AOR  on  the  small  test  problenut,  and  the  difficulty  of  determining  appropriate 
iteration  parameten  for  these  algorithms,  we  did  not  test  the  linear  stationary 
methods  on  this  set  of  problems. 

Problem  DAMIO  (figure  6.2a)  is  similar  to  DAM2  (see  §6.2),  except  that 
there  are  more  elements  ir  the  model.  There  are  1,220  planar  elements,  pro¬ 
ducing  a  problem  with  2,440  constraints  and  6,020  unknowns.  We  consider  two 
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(a}DAM10  (b}S0UD2 

Figure  6.2:  Models  for  Large  FuU-Raok  Problems 

versions  of  the  problem:  one  involving  no  substnicturing  of  the  physical  domain, 
and  another  with  two  substructures  of  fairly  equal  size  (and  178  columns  in  the 
transition  zone).  Again,  we  deliberately  chose  a  transition  zone  consisting  of 
a  horizontal  strip  through  the  domain  (see  the  figure),  so  that  the  upper  sub¬ 
structure  would  be  unstable,  and  one  of  the  diagonal  blocks  in  the  equilibrium 
matrix  would  lack  full  row  rank. 

Problem  SOLjCD2  (figure  6.2b)  is  similar  to  SOLID  1  (see  §6.2),  except  that 
the  rectangular  solid  is  very  tall,  and  there  are  more  elements  in  the  model. 
There  are  220  free  nodes  and  660  tetrahedral  elements  in  this  model,  producing 
a  problem  with  660  constraints  and  3,960  unknowns.  As  with  DAMIO,  we 
consider  one  version  with  no  substructuring,  and  a  second  ver»on  with  two 
substructures.  Even  though  the  matrices  are  fairly  large  for  this  problem,  the 
geometry  of  the  model  does  not  ^ow  a  small  transition  zone:  there  are  360 
transition  columns,  which  is  almost  10%  of  the  total  number  of  columns  in  the 
equilibrimn  matrix. 
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As  with  the  problems  in  the  previous  section,  we  compared  the  solutions 
produced  by  each  algorithm  to  a  reference  vector  assumed  to  be  the  “true"  so¬ 
lution.  This  time,  however,,  the  problems  were  too  large  to  obtain  a  solution 
using  LINPACK  on  the  Kuhn-Tucker  equations.  Since  all  algorithms  produced 
results  consistent  with  the  LINPACK  solutions  on  the  smaller  versions  of  the 
structures  problems,  we  felt  confident  that  the  codes  were  working  correctly. 
Therefore,  we  solved  each  problem  using  algorithm  INMG  with  a  stop  tolerance 
of  £  =  10“*®,  and  used  the  resulting  solution  as  the  “true”  solution  when  com¬ 
puting  enors.  We  then  adjusted  the  stop  tolerances  on  each  algorithm  so  the 
reT*orted  error  (in  the  infinity  norm)  was  roughly  2x10"^. 

We  summarise  the  results  of  the  experiments  in  table  6.2.  Perhaps  the 
most  interesting  difference  between  these  results  and  those  in  table  6.1  concerns 
the  relative  performano:  of  BNP  and  INMG.  Despite  the  fact  that  the  two 
methods  solve  the  same  linear  system  in  essentially  the  same  way,  algorithm 
INMG  is  auperiof,  espedally  on  DAM’  0.  Here  we  see  clearly  for  the  first  time 
that  the  method  used  to  calculate  the  defect  in  algorithm  BNP  is  unstable:  it 
causes  inaccuracies  in  the  dire^on  vectors  and  scale  factors,  resulting  in  slower 
convergence  (see  §3.2).  If  we  replace  the  non-standard  calculation  of  the  defect 
with  the  more  conventional  update  based  on  explicit  use  of  a  direction  vector 
and  scale  factor,  the  rauK  ^  for  BNP  coincide  almost  exactly  with  INMG.  In 
this  case,  the  two  codes  are  I’m-for-iine  virtually  identical;  the  maj.  r  difference 
is  the  fact  that  algorithn  BNP  updates  the  original  unknown  y  at  each  iteration, 
while  INMG  recovers  y  after  iteration**  cease  (§5.3). 

If  we  think  of  algorithm  INMG  as  an  improved  version  of  BNP,  the  results 
for  PWgt  make  sense.  Since  BNP  (and  therefore  INMG)  is  the  limiting  case 
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Table  6.2:  Numerical  Results:  Large  Full-Rank  Structures  Problems 


DAMIO 

(6,020  unknowns,  2,440  constraints) 
Two  Substructures: 


BNP  INMG  PWgt  INMI 


1148 

910 

955 

1416 

2  proc  time 

282. 

227. 

232. 

355. 

1  proc  time 

399. 

603. 

Speedup 

1.71 

1.70 

1.72 

1.70 

No  Substructuring: 


BNP  INMG  PWgt  INMI 


Iterations 

937 

HU 

830 

1139 

1  proc  time 

304. 

312. 

444. 

SOLID2 

(3,960  unknowns,  660  constraints) 
Two  Substructures: 


BNP  INMG  PWgt  INMI 


223 

227 

689 

64.2 

63.3 

62.1 

191. 

1  1  proc  time 

86.4 

87.3 

277. 

1.35 

1.38 

1.41 

1.45 

No  Substructuring: 


BNP  INMG  PWgt  INMI 


Iterations 

453 

429 

432 

1146 

1  proc  time 

114. 

112. 

103. 

291. 

"Minor  differences  occasionally  occur  when  changing  number 
of  processors.  Statistics  are  for  two-processor  runs. 
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of  the  preconditioned  weighting  algorithm,  it  is  only  logical  that  the  results  for 
INMG  are  slightly  better  than  those  for  PWgt.  Because  BNP  in  its  original  form 
is  converging  more  slowly  than  it  ought  to  (see  previous  paragraph],  algorithm 
PWgt  outperforms  BNP  on  both  problems. 

Comparing  the  results  with  and  without  substructuring  is  also  quite  inter¬ 
esting:  the  problems  change  character  completely  when  one  changes  the  parti¬ 
tioning  of  the  physical  domain.  For  DAMIO,  convergence  is  significantly  better 
without  substructuring,  while  for  SOLID2,  it  it  significantly  better  with  two 
substructures.  On  both  problems,  however,  the  results  with  two  substructures 
on  two  processors  are  superior  to  those  obtained  with  one  processor  on  one 
substructure.  Speedups  (comparing  substructured  results  on  one  versus  two 
processors)  were  not  as  good  as  we  would  have  liked,  but  clearly  reflect  the 
size  of  the  transition  zone:  tests  on  S0LID2,  which  has  a  large  transition  zone, 
exhibit  significantly  poorer  parallel  performance. 

Once  again,  INMI  performed  badly.  This  was  true  even  when  we  experi¬ 
mented  with  various  types  of  scaling.  It  is  quite  dear  that  one  cannot  afford  to 
ignore  the  matrix  G  in  constructing  a  preconditioner  for  an  implidt  nullspace 
method. 


6«4  Rank-Deficient  Structures  Problems 

To  test  our  ability  to  solve  problem  LSE  when  the  matrix  G  lacks  full  column 
rmik,  we  modified  the  problems  described  in  the  previous  section  to  simulate  the 
presence  of  ‘^damaged”  elements.  We  did  this  by  defining  a  ‘‘damaged”  region 
in  each  model,  and  setting  Poisson’s  ratio  u  to  the  appropriate  critical  value  for 
each  element  in  the  region  (see  §2.2).  The  assodated  blocb  of  the  matrix  F 
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3 


(a)  OAMIOO 


(b)  SOUD2D 


Figure  6.3:  Models  for  Rank  Deficient  Problems 


are  then  non^negative  definite  and  singxilar,  so  the  corresponding  blocks  in  the 
generalized  Cholesky  factor  F  are  rank  deficient.  The  modified  regions  in  each 
problem  are  small  enough  to  ensure  that  hypothesis  H2  (§2.1)  is  still  satisfied, 
so  the  problems  remain  well  posed. 

For  the  reasons  described  in  chapter  5,  interlacing  with  rows  G  faib  for 
these  problems.  Thus,  algorithms  BNP  and  Pwgt,  at  least  as  we  have  in^le- 
mented  them,  cannot  be  used.  The  same  is  true  of  algorithm  INMG,  which  is 
essentially  an  improved  version  of  BNP.  We  are  left  with  algorithms  INMCI 
and  INMI. 

Problem  DAMlOD  (figure  6.3a)  is  a  modified  version  of  DAMIO  (see  §6.3); 
there  are  2,440  constraints  and  6,020  unknoums.  The  modified  region  (the  dark 
area  in  the  figure),  consists  of  100  elements,  or  8%  of  the  tot^  area  of  the  model. 
The  corresponding  blocks  in  F  and  Cr  are  5x5  with  rank  4.  Thus  the  rank  of  G 
is  5,91%;  this  is  100  short  of  the  total  number  of  columns  in  G.  Despite  a  rank 
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deficiency  of  100,  only  10  rows  of  the  scaled  identity  m;  .trix  were  needed  by  the 
INMGI  to  produce  an  upper  triangular  matrix  B\. 

Problem  SOLID2D  (figure  6.3b)  is  a  modified  version  of  SOLID2  (see  §6.3); 
there  axe  660  constraints  and  3,960  unknowns.  The  modified  region  (the  dark 
area  in  the  figure),  consists  of  20  elements,  or  3%  of  the  total  volume  of  the 
model.  The  corresponding  blocks  in  F  and  G  are  6x6  with  rank  5.  Thus 
the  rank  of  G  is  3,940;  this  is  20  short  of  the  total  number  of  columns  in  G. 
Algorithm  INMGI  required  12  rows  of  the  scaled  identity  matrix  to  complete 
the  construction  of  Bi. 

Once  again,  these  problems  are  too  large  to  obtain  a  solution  by  using  LIN- 
PACK  on  the  Kuhn-Tucker  equations.  We  obtained  solutions  consistent  with 
the  LINPACK  solutions  on  smaller  versions  of  these  problems,  so  we  felt  confi¬ 
dent  that  the  codes  were  working  correctly.  Therefore,  we  solved  each  problem 
using  algorithm  INMGI  with  a  stop  tolerance  of  e  s  10~*,  and  used  the  resulting 
soluticm  as  the  ‘Hrue’*  solution  when  computing  errors.  We  then  adjusted  the 
stop  tolerances  on  each  algorithm  so  the  repeated  error  (in  the  infinity  norm) 
was  roughly  10"^. 

Table  6.3  is  a  summary  of  the  results  for  these  test  problems.  While  both 
INMGI  and  INMI  successfully  solve  the  problems,  algorithm  INMGI  is  clearly 
superior.  Algorithm  INMGI  uses  as  much  inform^ion  as  possible  from  the 
matrix  G.  In  both  problems,  very  few  rows  of  the  identity  matrix  are  needed 
to  form  Bi,  so  algorithm  INMGI  is  ''almost’*  INMG  for  these  problems.  We  do, 
however,  note  that  it  is  necessary  to  scale  the  matrices  as  described  in  §6.3  to 
achieve  these  results:  when  the  rows  of  G  are  not  0(1)  (or,  equivalently,  if  we 
do  not  scale  the  identity  matrix  before  uring  it  for  interlacing),  the  performance 
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Table  6.3:  Numerical  Results:  Rank  Deficient  Structures  Problems 


DAMIOD 

6,020  unknowns,  2,440  constraints 
Bank  d*  (?:  5,920 

Two  Substructures: 


INMGI  INMI 


liSgnBili 

iw 

|||||||[|Q|g||| 

591.  ! 

[  1  proc  time  { 

mmumm 

luai 

■BH 

SOLlDaD 

(3,^  unknowns,  €60  constraints) 
RankdG:  3,940 

Two  SubstructuKis: 


mMGI  INMI 


1802 

mmm 

47S, 

692. 

“TiTr 

1.W  1 

*  Minor  differencei  occasionally  occur  wh*n  changing  number 
of  pTiicesson.  Statistics  are  f<^  two-processor  runs. 
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is  INMGl  is  unacceptable. 

6.5  Stokes  Flow 

To  test  our  ability  to  solve  saddle  point  problems,  we  experimented  with  the 
marker-and>^ell  fonnvdation  of  Stokes  Sow  on  the  unit  square  (§2.3).  As  men¬ 
tioned  in  chapter  4,  algorithm  BNP  cannot  be  used  to  solve  this  problem.  Al¬ 
gorithm  INMG,  which  is  the  nuHspace  version  of  BNP,  can  deal  with  it  easily. 
We  can  also  use  Pwgt,  INMI,  and  any  of  several  versions  of  algorithm  INMF. 
To  test  algorithm  INMF,  we  used  selected  rows  of  the  Cholesky  factor  of  the 
block  diagonal  pijt  of  F  to  augment  the  equilibrium  matrix.  RecalUng  horn 
§2.3  that  the  diagonal  blocks  of  F  are  tridiagonal,  this  means  the  code  used 
sriected  tows  of  the  Cholesky  ^tor  of  the  tridiagonal  part  of  F  to  form  the 
augomn^ot^  matrix.  We  did  not  fonn  F  explicitly  in  any  of  the  codes;  the 
subroutine  which  computes  the  matrix-vector  product  Fs  in  INMG,  INMF,  and 
INMI  \u>es  the  stencU  defining  the  action  of  F  on  an  arbitrary  vector  (again,  see 
§2.3).  This  routine  b  quite  fast,  and  explains  the  fact  that  the  time  required  for 
each  iteratioa  of  these  algorithms  b  about  half  that  required  for  an  iteration  of 
Pwgt. 

We  tested  both  a  50x50  ^rid  (tesuliing  in  a  problem  with  2,40§  pr^ure  con¬ 
straints  and  4,900  unknown  velocity  components),  and  a  100x100  grid  (9,999 
pressure  coastrmnts  and  19,800  unknowas).  We  each  problem  with  and 
without  substructuring:  there  were  two  sub-domains  in  the  substructured  ver¬ 
sions.  Algorithms  INMG  and  PWgt,  which  require  that  the  matrix  G  reSect  the 
substructuring,  emidoy  the  wide  transition  zone  described  in  §2.4.  Algorithms 
INMI  and  INMF  do  not  use  G;  we  IheLcfore  used  the  narrow  transition  aone 
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described  in  that  section.  In  both  cases,  the  transition  zone  constitutes  a  small 
percentage  of  the  columns  of  E. 

We  used  the  following  artificial  ‘^ow*  as  our  test  problem: 

vi  =  2xcosy 
=  — as^siny 
p  =  xy\ 

Here  vi  and  va  are  the  horizontal  and  vertical  components  of  the  (continuous) 
velocity  vector,  and  p  represents  pressure.  The  velocity  has  non-zero  divergence, 
but  is  irrotational  (curl  u  =  0).  In  fairness,  we  report  that  we  have  been  unable 
to  use  our  codes  to  solve  for  rotational  flows:  on  several  problems  involving 
non-zero  curl,  we  obtained  errors  (based  on  the  true  continuous  solution)  on 
the  order  of  2x  10*^  regardless  of  the  spedfied  mesh.  We  have  not  been  able  to 
determine  the  cause  of  this  behavior. 

We  used  the  continuous  solution  as  given  above  to  measure  the  error  ob* 
tained  by  each  algorithm.  We  varied  the  stop  tolerances  for  each  algorithm  to 
determine  the  smallest  attainable  error  on  the  given  grid,  and  then  ran  the  tests 
of  each  algori&hm  with  the  largest  stop  tolerance  which  produced  this  error.  For 
the  50x50  grid,  the  smallest  attainable  error  (measured  in  the  infinity  norm) 
was  roughly  6x10'^  for  all  algorithms;  for  the  100x100  grid,  the  best,  attain¬ 
able  error  was  roughly  SxlO*^.  Thus,  the  discretization  appears  to  be  an  0{h) 
gjbbal  approximation  to  the  continuous  problem. 

Table  6.4  summarizes  the  results  of  the  experiments.  We  observe  that  the 
tri-diagcmal  portion  of  F  produces  a  poor  preconditioner  in  INMF.  It  seems  we 
cannot  afford  to  ignore  the  information  in  the  outer  bands  ol  F  when  construct- 
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Table  6.4:  Numerical  Results:  Stokes  Flow  on  Unit  Square 


50x50  Grid 

(44KM)  unknowns,  2,499  constraints) 
Two  Substructures: 


INMG  PWgt  INMI  INMF 


MLTMmi 

354 

433 

1601 

1637 

EJJ33ES3 

69.4 

85.6 

127. 

157. 

1  proc  time 

127. 

157. 

224. 

283. 

1.83 

1.83 

1.76 

1.80 

No  Substructurin^: 


INMG  PWgt  INMI  INMF 


derations 

IKS 

MSM 

1605 

1681  1 

grfwiTi^i 

\mmm 

mmm 

214. 

lOOxlOO  Grid 

(19300  ttnknowtts,  9309  ooBstraints) 
Two  Substructures: 


INMG  PWgt  INMI  INMF 


Iterations*  | 

1  1132 

1370 

mmm 

8352 

iafgtigE)'.!!a' 

immm 

mfvm 

129i 

(jj223(2S23i  1 

|glW 

mmm 

BSOli 

iMEMI 

1.89 

1  1.73  1 

1.79  i 

No  Substmcturiag: 


INMG  PWgt  INMI  INMF 


iterations 

iKai 

594 

9721 

8503 

1  proc  time 

1  274. 

886. 

4990. 

5930. 

*Minor  differences  occasioaaJly  occur  when  chan^ng  number 
of  processors.  SUristici  are  for  tWo-piocMsor  runs. 
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ing  the  preconditioner,  especially  when  we  are  only  using  selected  rows  of  the 
factorization  we  obtain.  INMI  also  performs  poorly,  as  it  did  on  all  other  test 
problems. 

Algorithm  INMG,  on  the  other  hand,  performs  fairly  wdl;  the  natural  factor 
G  of  the  L^)lacian  matrix  F  (see  §2.3)  allows  us  to  take  advantage  of  some 
outer  band  information  in  constructing  the  preconditioner.  The  difference  in 
iterations  between  INMG  and  PWgt  is  greater  here  than  it  is  for  any  other 
class  of  problems;  the  substantial  difference  in  total  time  is  due  to  INMG's 
efficient  matrix>vector  multiplier  (based  on  the  stencil  defimng  F). 

Notice  we  obtain  quite  reasonable  speedupa  on  the  Stokes  problem:  the 
transition  zones  are  fairly  small,  and  the  subdomains  have  virtually  identical 
structure.  On  the  other  hand,  the  results  for  INMG  without  domain  decom* 
positirm  reflect  an  extraordinary  reduction  in  the  number  of  iterations.  The 
change  in  the  rate  of  convergence  b  so  dramatic  that  the  time  required  to  solve 
the  problem  on  one  prooesaor  without  domain  deocmpoaition  b  almost  half  that 
needed  for  the  aubatructured  problem  on  two  piocessora.  Clearly  one  would  need 
a  more  aophbticated  approach  to  conatructing  to  overcome  the  penalty  as- 
aociated  with  domain  decomposition  ^plied  to  the  Stokes  problem.  Incomplete 
Cholesky  decompositions  offer  one  possibility  worth  exploring. 

We  note  that  we  made  no  attempt  to  exploit  the  underlying  continuous 
problem  to  constnict  a  plausible  i^arting  vector  for  the  iterations.  We  were 
interested  in  the  dbcretized  Stokes  problem  only  as  an  example  of  a  linear 
system  b  saddle  pout  (asm.  The  experiments  bdic^  that  we  have  in  fact 
sucoseded  in  extending  BNP  to  a  class  of  algorithms  capride  of  solving  such 
systems. 
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7.  Conclusions 


The  analyaia  and  experiments  described  in  this  thesis  hardly  constitute  an  ex¬ 
haustive  study  q£  the  subject  at  hand.  But  the  work  does  seem  to  make  a  con¬ 
vincing  case  that  algorithm  BNP  in  particular,  and  order-reducing  conjugate 
gradOient  methods  in  general,  ofi'er  a  promising  alternative  to  existing  methods 
for  solving  large,  sparse  constrained  minimiaation  problems. 

The  proofs  that  algorithm  BNP  is  superior  to  p-cyclic  SOR  [4]  and  block 
AOR  (chapter  4)  held  oiUy  in  exact  arithmetic.  But  the  results  are  decisive 
for  the  test  probleios  we  have  considered:  even  the  poorest  implicit  nullspace 
method  proved  faster  than  optimal  2-SOR  and  3*AOR  by  a  wide  margpn.  Addi¬ 
tionally,  order-reducing  conjugate  i^adieot  methods  tree  the  user  of  the  burden 
of  chosing  iteration  parameters.  It  is  (}uite  difficult  to  determine  the  optimal 
parameter{s)  in  both  p-cyclic  SOR  and  block  AOR.  In  the  case  of  3-AOR,  it 
may  be  hard  to  select  parameters  which  the  algorithm  even  converges.  If 
anything,  the  exparimeats  may  understate  the  advantage  of  the  conjugate  gra¬ 
dient  methods,  since  production  codes  would  notmally  ne^  to  detennine  the 
SOR  and  AOR  parameters  adaptively. 

By  esUblishing  that  algorithm  BNP  is  the  limiting  case  of  the  pitconditioocd 
weighting  method,  we  have  shown  that  BNP  is  competitive  with  yet  another 
aj^proach  to  solving  problem  LSB.  Again,  the  experimental  evidence  supports 
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the  analysis.  For  ^-uch  of  the  problems  we  considered,  algorithm  INMG,  which 
is  essentially  an  improved  version  of  BNP,  achieved  performance  comparable  to 
(and  generally  faster  than)  preconditioned  weighting. 

Perhaps  the  most  encouraging  aspect  of  this  work  is  the  characterization 
of  algorithm  BNP'  as  a  nuUspacc  method,  and  the  resulting  extension  to  more 
general  implicit  nuUspace  methods.  We  obtain  a  class  of  methods  capable  of 
solving  problems  for  which  algorithm  BNP  does  not  appear  to  be  well  suited, 
including  saddle  point  problems,  and  problems  in  which  G  lacks  full  column 
rank.  Moreover,  for  problems  reflecting  a  substructuring  of  the  physical  domain, 
implicit  mdlspace  methods  offer  opportunities  for  parallel  computation. 

In  addressing  the  question  of  how  to  construct  an  augmentation  matrix  which 
leads  to  an  effective  implicit  nuilspace  method,  we  have  merely  scratched  the 
surface:  of  the  methods  we  have  tested,  only  those  which  depend  heavily  on  rows 
of  G  (algorithms  INMG  and  INMGI)  show  potential  as  truly  robust  algorithms. 
But  flexibility  in  the  choice  of  the  augmentation  matrix  M\  appears  to  be  the 
greatest  strength  of  the  approach  we  describe.  We  strongly  believe  that  further 
research  on  more  effective  ways  to  generate  the  augmentation  matrix  will  prove 
fruitful. 
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9.  Appendix:  The  Breakdown  of 
Incomplete  QR  Factorizations 


The  research  which  led  to  this  dissertation  began  with  a  look  at  incomplete  QR 
preconditioners  for  ordinary  least  squares  problems.  We  quickly  developed  an 
interest  in  order«reducing  conjugate  gradients  for  constrained  problems,  leaving 
unfinished  the  work  on  incomplete  QR  factorizations.  Before  this  change  in 
direction,  however,  we  encountered  some  simple  examples  of  a  mechanism  which 
can  cause  incomplete  QR  factorizations  to  break  down.  We  felt  these  examples 
were  interesting  enough  to  deserve  a  separate  discussion;  we  include  them  in 
this  appendix.  The  notation  throughout  the  appendix  is  independent  of  the 
preceding  chapters. 

9.1  Preconditioning  with  Incomplete  Factor¬ 
izations 

We  consider  the  following  ordinary  least  squares  problem:  given  an  mxn  matrix 
A  with  full  column  rank,  and  an  mxl  vector  6, 

minimize  ||Ax  -  6||2.  (9.1) 

The  unique  solution  x  satisfies  the  so-called  normal  equations: 

A^Ax  ss  A^b,  (9.2) 
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Notice  that  the  coefficient  matrix  A^A  is  symmetric  positive  definite  when  A 
has  full  colunm  rank. 

There  are,  however,  a  number  of  well-known  difficulties  associated  with  using 
the  normal  equations  (see,  for  example,  Bjorck  [7]  or  Golub  and  van  Loan  [12]). 
These  include  the  conditioning  of  the  problem  (the  spectral  condition  number 
of  A^A  is  the  square  of  the  condition  number  of  A  itself),  and  the  0(n®)  cost 
^d  loss  of  information  associated  with  forming  A^A.  We  can  avoid  some  of 
these  problems  by  leaving  the  normal  equations  in  factored  form  and  solving  the 
problem  iteratively  (using,  for  example,  a  conjugate  gradient  algorithm).  But 
convergence  of  the  conjugate  gradient  algorithm  still  depends  on  the  condition 
number  of  A^A,  so  we  generally  need  to  precondition  the  normal  equations  to 
have  any  hope  of  producing  an  effective  algorithm. 

Now  suppose  we  have  an  orthogonal  factorization  of  A  given  by  A  s  QR, 
where  the  matrix  Q  is  orthogonal,  and  R  is  upper  triangular.  Recall  that 
R'^^R  ss  A^A;  in  fact,  R  is  (up  to  changes  in  the  signs  of  the  rows)  precisely  the 
Cholesky  factor  of  A^A.  Given  such  a  factorization,  we  could  “precondition” 
the  normal  equations: 

Rr'^A^ARr'w  =  RT^AFb,  where  w  «  Rx.  (9.3) 

But  ARr^  =  Q,  and  Q^Q  =  /,  so  this  reduces  to  u;  «  R'"‘^A^b. 

Of  course,  there  is  no  need  to  iterate  on  such  a  trivial  system;  we  have  in 
fact  described  a  way  to  use  a  QR  factorization  to  solve  a  least  squares  problem 
directly.  But  if  we  did  apply  a  conjugate  gradient  algorithm,  we’d  achieve  “con¬ 
vergence”  in  one  iteration.  Thus,  in  a  sense,  the  Cholesky  factor  of  A^A  is  the 
“perfect”  preconditioner,  and  the  thought  experiment  motivates  an  approach  to 
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preconditioning  the  normal  equations  (9.2).  Let  R  be  some  non-singular  upper 
triangular  matrix  which  is  in  some  sense  a  rough  approximation  of  the  Cholesky 
factor  R  of  A^A.  We  are  tempted  to  ask  whether  R  may  be  a  good  precon¬ 
ditioner  for  the  normal  equations:  perhaps  AR~^  is  “roughly”  orthogonal,  and 
the  matrix  R~'^A'^AR~^  is  a  crude  approximation  of  the  identity  matrix  1.  If 
so,  then  the  preconditioned  system 

R~^A^ARr^w  =  k~^A^b  where  w  ^  Rx  (9.4) 

ought  to  be  well-suited  for  the  conjugate  gradient  algorithm. 

9.2  A  Strategy  for  Incomplete  QR 

One  way  to  form  the  matrix  A  is  to  use  a  so-called  incomplete  Cholesky 
factorization  (see,  for  example,  Meijerink  and  van  der  Vorst  [25],  or  Ortega 
[26]).  Such  factorizations  compute  a  conventional  Cholesky  factorization  of  a 
symmetric  positive  definite  matrix,  but  retain  only  selected  non-zeros  as  the 
factorization  proceeds.  One  could  use  a  strategy  based  on  the  size  of  the  non¬ 
zeros,  keeping  only  elements  larger  than  some  fixed  or  varying  drop  tolerance. 
Alternatively,  one  might  instead  employ  a  method  based  on  the  position  of  the 
non-zeros;  here,  one  selects  a  predetermined  non-zero  pattern  for  the  approx¬ 
imate  Cholesky  factor  ky  preserving  only  those  non-zeros  which  conform  to 
the  selected  pattern.  More  sophisticated  approaches,  including  combinations  of 
these  methods,  are  of  course  possible  (see,  for  example,  Manteufiel  [23]  for  an 
algorithm  which  includes  a  shifting  of  the  coefficient  matrix). 

The  incomplete  Cholesky  factorization  is  certainly  a  promising  approach  to 
preconditioning  symmetric  positive  definite  systems  when  the  coefficient  matrix 
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is  already  explicitly  available.  For  problems  iavolving  normal  equations,  though, 
we  generally  want  to  avoid  forming  A^A.  For  that  reason,  we  might  take  a 
somewhat  different  approach  to  constructing  the  preconditioner  R:  we  could 
apply  orthogonal  rotations  to  A  as  if  we  were  accomplishing  a  QR  factorization, 
but  again  preserve  only  selected  non-zeros.  As  with  incomplete  Cholesky,  we 
can  accomplish  such  an  incomplete  QR  factorization  (IQR)  by  using  drop 
tolerances,  specified  non-zero  patterns,  or  other  more  elaborate  strategies  (see, 
for  example,  Jennings  and  Ajiz  [21],  or  Zlatev  and  Nielsen  [40]). 

Regardless  of  the  drop  strategy,  there  is  a  question  of  existence  to  address. 
Certainly  a  complete  orthogonal  factorization  in  exact  arithmetic  will  produce  a 
non-singular  upper  triangular  matrix  R.  But  once  we  begin  dropping  non-zeros, 
we  may  in  fact  introduce  linear  dependence  in  the  columns  of  the  reduced  ma¬ 
trix,  preventing  us  from  generating  a  non-singular  preconditioner  R.  Jennings 
and  Ajiz  [21]  describe  a  sophisticated  strategy  based  on  drop  tolerances,  and 
prove  that  it  produces  a  non-singular  preconditioner.  They  produce  a  similar 
result  for  a  strategy'  based  oc  Gram-Schmidt  orthogonalization  (see  also  Saad 
[33]).  We  examine  here  a  strategy  which  uses  Givens  rotations  and  a  drop 
strategy  based  on  the  position  of  non-zeros. 

Begin  by  agreeing  to  confine  non-zeroe  in  the  upper  triangular  matrix  k  to 
those  positions  in  which  A^A  has  non-zeros.  We  determine  the  non-zero  pattern 
of  A^A  by  symbolic  matrix  multiplication.  More  precisely,  columns  t  and  j  of 
the  matrix  A  are  said  to  be  symbolically  interactive  (or  simply  interactive) 
if  there  exists  a  k  such  that  both  au  and  akj  are  non-zero.  In  this  case,  the 
product  A^A  is  said  to  have  a  symbolic  non-zero  in  position  (t,  j).  If  no  such 
k  exists,  we  call  the  columns  symbolically  non-interactive  (or  simply  non- 


117 


interactive),  and  we  note  that  A^A  has  a  (symbolic)  zero  in  position  (»,i).  This 
approach  to  defining  the  non-zero  pattern  of  A^A  (and  A)  amounts  to  assuming 
that  no  cancellation  occurs  in  forming  the  product.  Of  course,  since  R  is  upper 
triangular  (smd  A^A  is  symmetric  positive  definite),  we  need  only  consider  the 
case  t  <  j  when  performing  the  symbolic  multiplication.  The  diagonal  elements 
of  the  product  are  always  symbolically  non-zero  (in  fact  they  are  niunerically 
positive),  because  the  columns  of  A  interact  with  themselves. 

Now  apply  the  following  reduction  to  A; 

Incomplete  QR  Factorization 

1.  Use  symbolic  matrix  multiplication  to  determine  the  non-zero  data  struc¬ 
ture  of  the  upper  triangular  portion  of  A^A.  Use  this  information  to  fix 

« 

a  static  data  structure  for  the  approximate  Cholesky  factor  R. 

2.  For  targetrov  a:  1, . . . ,  n: 

(a)  For  pivotrotf  =  1, . . .,  (targetrow-1): 

•  If  necessary,  rotate  pivotroe  against  targetrov,  annihilating  the 
leading  non-zero  in  position  k  =  pivotrov  of  targetrov. 
e  Accumulate  all  fill  in  targetrov;  i.e.  drop  no  non-zeros  produced 
in  targetrov  by  the  Givens  rotation, 
e  Retain  non-zeros  in  pivotrov  only  if  they  occur  in  positions  corre¬ 
sponding  to  symbolic  non-zer(»  in  the  upper  triangular  portion 
of  A^A. 

(b)  Now  that  targetrov  has  been  completely  reduced,  retain  only  those 
non-zeros  which  occur  in  positions  corresponding  to  symbolic  non- 
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zeros  in  the  upper  triangular  portion  of  A^A.  Store  the  selected  non¬ 
zeros  of  this  fully  reduced  row  in  the  static  data  structure  reserved 
for  R. 

Thus  we  allow  complete  fill  in  a  target  row  while  reducing  that  row,  but 
restrict  fill  in  the  pivot  rows  to  positions  corresponding  to  symbolic  non-zeros 
of 

In  addressing  whether  this  algorithm  successfully  produces  a  non-singular 
upp^  triangular  R,  one  might  first  ask  the  same  question  about  incomplete 
Cholesky  factorizations.  It  has  long  been  known  that  incomplete  Cholesky 
may  break  down  when  the  drc^  strategy  b  based  on  the  non-zero  pattern  of 
A^A\  Kershaw  [22]  provides  a  simple  example.  If  A^A  is  either  diagonally 
dominant  or  an  M-matrix,  however,  thb  form  of  incomplete  Cholesky  always 
produces  a  non-singular  preconditioner  (in  fact,  a  slightly  more  general  result 
holds;  see  Manteuffel  [23]).  One  ought  be  led  to  expect  similar  results  for  IQR. 
We  demonstrsto  in  the  next  section  that  thb  b  not  the  case:  the  IQE  algorithm 
described  above  can  in  fact  break  down,  even  if  A^A  b  diagonally  dominant  or 
an  M-matiix. 

9.3  Examples  of  Incomplete  QR  Breakdown 

In  thb  section  we  demonstr^  by  example  that  the  IQR  algorithm  described 
above  can  in  fact  break  down,  even  under  fairly  rr^i^rictive  conditions. 

Begin  with  the  foilowmg  4x4  matrix  A: 


1  1 

0 

0  ' 

0  1 

0 

1 

1  0 

1 

1 

.0  6 

0 

2. 
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(9.5) 


The  matrix  has  full  column  rank,  so  A^A  is  symmetric  positive  definite.  (/I  is 
square  merely  to  keep  the  example  small;  one  could  easily  construct  rectangular 
matrices  to  achieve  the  effects  described  below.) 

We  now  apply  the  IQR  factorization  described  in  the  previous  section.  Form¬ 
ing  the  symbolic  product  A^A  gives  us  the  non-zero  pattern  for  Ri 


*  *  *  * 
*  0  * 
*  * 


(9.6) 


Note  that  there  are  symbolic  non-zeros  in  every  position  except  (2, 3). 

We  now  detail  the  reduction.  The  non-zeros  in  row  1  of  A  conform  to  the 
required  non-zero  pattern,  so  we  retain  row  1  as  is.  Row  2  has  a  zero  in  column 
1,  so  no  reduction  is  necessary.  Agm,  its  non-zeros  conform  to  the  required 
pattern,  so  we  retam  all  the  non-zeroa.  Now  let  row  3  be  the  target  row.  Begin 
by  using  row  I  as  the  pivot  row  to  annihilate  the  (3, 1)  element.  Applying  the 
appropriate  Givens  rotation  produces 


pivot 


1.4142 


target  -«> 


0 

0 

0 


0.7071 

1.0000 

-0.7071 

6.0000 


0.7071  0.7071  ■ 
0  1.0000 
0.7071  0.7071 
0  2.0000 , 


(9.7) 


Our  strategy  calls  for  preserving  all  elements  in  the  target  row  until  the 
reduction  that  row  is  complete,  so  at  this  point  we  discard  no  elements  in 
row  3.  The  non-zeros  in  the  pivot  row  conform  to  the  required  pattern,  so 
we  preserve  all  elements  there  as  well.  Thus,  up  until  now,  our  'incomplete” 
factorization  coincides  with  a  classical  QR  factorization.  We  are  not  finished 
reducing  row  3,  however.  Now  use  row  2  as  the  pivot  row  to  annihilate  the  (3, 2) 
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element.  The  complete  Givens  rotation  (no  discarded  elements)  looks  like  this: 


■  1.4142  0.7071  0.7071  0.7071  ' 

pivot 0  1.2247  -0.4082  0.4082 

target  -+  0  0  0.5774  1.1547 

0  6.0000  0  2.0000 


(9.8) 


Recall,  however,  that  there  is  a  symbolic  zero  in  position  (2,3),  so  we  must 
discard  the  (2, 3)  element  in  the  pivot  row  (in  practice,  we  do  not  even  generate 
it).  We  also  note  that  the  reduction  of  target  row  3  is  now  complete,  so  we 
check  to  see  if  we  must  discard  any  non-zeros.  The  non-zeros  in  row  3  conform 


to  the  required  non-zero  pattern,  so  we  preserve  all  the  non-zeros  in  this  row. 


A 

Thus,  when  the  reduction  of  target  row  3  is  complete,  the  matrix  R  looks  like 


this: 


target  -+ 


1.4142  0.7071  0.7071  0.7071  ' 
0  1.2247  0  0.4082 

0  0  0.5774  1.1547 

0  6.0000  0  2.0000 


(9.9) 


Discarding  the  (2,3)  entry  has  created  a  dependency  which  wiU  become 
apparent  on  the  next  rotation.  Row  4  now  becomes  the  target  row;  using  pivot 
row  2  to  annihilate  the  leading  non-zero  produces  the  singular  matrix  below: 


■  1.4142  0.7071  0.7071  0.7071 
pivot  -*  0  6.1237  0  2.0412 

0  0  0.5774  1.1547 

target  .  0  0  0  0 


(9.10) 


The  algorithm  has  failed  to  produce  a  non-singular  upper  triangular  k. 

It*8  helpftd  to  examine  how  we  constructed  the  example.  The  3x3  principal 
suhmatrix  in  the  upper  left  comer  df  (9.10)  b  non-singular,  so  its  odumns  span 
In  particular,  the  vector  consbting  of  the  first  three  entries  of  the  last 
column  of  (9.10)  can  be  written  as  a  unique  Uneax  combination  of  the  a>lumns 
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of  this  submatrix.  Thus 

■  1.4142  r  0.7071  r  0.7071  ]  0.7071  ” 

oi  0  +03  6.1237  +  03  0  =  2.0412  (9.11) 

0  J  L  0  J  0-5774  J  1.1547  J 

for  some  unique  choice  of  oti,  03,  and  03.  Now  choose  the  elements  in  the 
fourth  row  of  the  ori^al  matrix  A  so  that  the  entries  in  the  row  satisfy  the 
same  relatioi^p: 

01041  +  03O43  +  03043  =  044.  (9.12) 

This  ensures  the  matrix  lacks  full  column  rank  ajitr  the  element  in  position  (2, 3) 
is  discarded.  If  we  choose  the  entries  in  this  row  so  that  (9.12)  is  satisfied  and 
the  original  matrix  has  full  column  rank,  we  succMd  in  producing  an  example 
of  breakdown. 

Note  that  this  construction  is  not  overly  dependent  on  the  fact  that  we  are 
using  the  tton*zeto  pattern  of  A^Ai  pven  another  strategy  based  on  a  specified 
aoU'Sero  pattern,  we  may  be  abte  to  obstruct  an  example  of  breakdown  in  a 
similar  manner.  We  need  only  begin  to  reduce  a  simple  example,  note  the  stage 
at  which  the  first  element  b  discarded,  and  adjust  the  values  of  the  elements 
bcbw  the  cuiTOQt  target  row  to  create  a  column  dependency,  (We  should  also 
observe  in  passing  Uiat  a  dropped  element  can  have  the  opposite  effect,  changing 
a  dependent  set  of  vectors  into  an  independent  set.) 

We  note  several  other  interesting  facta  about  the  algorithm  and  the  break¬ 
down  mechanism: 

1.  Roto  pennutations  may  or  may  not  pnvtnt  breakdown.  If  we  permute  rows 
2  and  4  in  the  matrix  A  above,  the  incomplete  factorization  terminates 
normally.  But  there  b  nothing  specif  about  thb  permutation:  rows  2 
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and  4  have  the  same  non^zero  pattern,  and  we  could  just  as  easily  have 
chosen  numerical  values  to  cause  breakdown  for  this  permutation. 

2.  There  is  at  least  one  situation  tohich  guarantees  a  successful  factorization. 
Suppose  we  are  given  a  matrix  A  with  many  more  rows  than  columns, 
and  we  partially  reduce  the  matrix  until  reaching  the  form 

(9.13) 

where  is  upper  triangular  and  non>singular.  Then  subsequent  incom¬ 
plete  factorization  cannot  create  dependent  columns  (in  fact,  continued 
reduction  cannot  reduce  the  singular  values  of  Ri).  This  is  because  a 
Givens  rotation  which  uses  a  leading  non-zero  r  in  a  pivot  row  of  A)  to 
annihilate  a  leading  non-zero  a  in  a  target  row  of  Az  will  always  leave 
the  larger  non-zero  value  on  the  diagonal  of  the  pivot  row.  This 

fact  could  prove  useful  in  problems  involving  updoita^  if  we  start  with  a 
matrix  A  and  an  approximate  k,  then  introduce  new  observations  (rows), 
we  can  update  R  using  IQR  without  fear  of  breakdown. 


3.  We  can  place  much  stronger  conditions  on  the  matrir  A  and  still  esperi- 
ence  breaJidovn.  Comnder,  for  example,  the  matrix 


9  ~3 
0  9 
3  0 
0  -7 


0  0 
0  -12 
9  3 

0  9 


(9.14) 


In  this  case,  is  strictly  diagonally  dominant,  but  breakdown  occurs 


exactly  as  it  did  in  the  example  above.  Similarly,  the  matrix 

-5  5  0  10 

0  5  0  -20 

”  5  0-5  5 

.0-30  5. 


(9.15) 
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leads  to  a  breakdown,  despite  the  fact  that  A^A  is  an  M-matrix. 

Finally,  we  report  that  breakdowns  such  as  these  do  in  fact  occur  in  practical 
problems.  We  have  tested  the  IQR  factorization  described  in  this  section  on 
the  four  sparse  rectangular  matrices  WELL1033,  WELL1850,  ILLC1033,  and 
ILLC18S0  from  the  Harwell-Boeing  test  collection  (see  Duff,  Grimes,  and  Lewis 
[9]),  and  observed  breakdowns  on  WELL1850  and  ILLC1850.  We  were  able  to 
prevent  breakdown  by  reordering  the  rows  of  the  matrices  in  stairstep  fashion 
according  to  the  positions  of  the  leading  non-zeros.  Unfortunately,  though, 
simply  producing  a  non-singular  R  is  not  enough.  For  each  of  the  four  matrices 
we  tested,  the  quality  of  the  IQR  preconditioner  was  marginal  at  best;  we  have 
not  yet  produced  resvdts  which  are  fast  enough  to  be  competitive  with  any  of 
the  established  iterative  methods. 
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